Unblocking a stuck Lab Manager Environment (the hard way)

This is a post so I don’t forget how I fixed access to one of our environments yesterday, and hopefully it will be useful to some of you.

We have a good many pretty complex environments deployed to our lab hyper-V servers, controlled by Lab manager. Operations such as starting, stopping or repairing those environments can take a long, long time, but this time we had one that was quite definitely stuck. The lab view showed the many servers in the lab with green progress bars about halfway across but after many hours we saw no progress. The trouble is, at this point you can’t issue any other commands to the environment from within the Lab Manager console – it’s impossible to cancel the operation and regain access to the environment.

Normally in these situations, stepping from Lab Manager to the SCVMM console can help. Stopping and restarting the VMs through SCVMM can often give lab manager the kick it needs to wake up. However, this time that had no effect. We then tried restarting the TFS servers to see if they’d got stuck, but that didn’t help either.

At this point we had no choice but to roll up our sleeves and look in the TFS database. You’d be surprised (or perhaps not) at how often we need to do that…

First of all we looked in the LabEnvironment table. That showed us our environment, and the State column contained a value of Repairing.

Next up, we looked in the LabOperation table. Searching for rows where the DataspaceId column value matched that of our environment in the LabEnvironment table showed a RepairVirtualEnvironment operation.

In the tbl_JobSchedule table we found an entry where the JobId column matched the JobGuid column from the LabOperation table. The interval on that was set to 15, from which we inferred that the repair job was being retried every fifteen minutes by the system. We found another entry for the same JobId in the tbl_JobDefinition table.

Starting to join the dots up, we finally looked in the LabObject database. Searching for all the rows with the same DataspaceId as earlier returned all the lab hosts, environments and machines that were associated with the Team Project containing the lab. In this table, our environment row had a PendingOperationId which matched that of the row in the LabOperation table we found earlier.

We took the decision to attempt to revive our stuck environment by removing the stuck job. That would mean carefully working through all the tables we’d explored and deleting the rows, hopefully in the correct order. As the first part of that, we decided to change the value of the State column in the LabEnvironment table to Started, hoping to avoid crashing TFS should it try to parse all the information about the repair job we were about to slowly remove.

Imagine our surprise, then, when having made that one change, TFS itself cleaned up the database, removed all the table entries referring to the repair environment job and we were immediately able to issue commands to the environment again!

Net Writer: A great UWP blog editor

I came across Net Writer some months ago, when it’s creator, Ed Anderson blogged about how he’d taken the newly-released Open Live Writer code and used it in his just-started Universal Windows Platform (UWP) app for Windows 10. In January it only supported blogger accounts, which meant that I was unable to use it. However, I checked again this weekend and discovered that it now supports a wide range of blog software including BlogEngine.net that powers blogs.blackmarble.co.uk.

I’m writing this post using the app. It’s great for quick posts (there’s no plugin support so posting code snippets is tricky) and most importantly, it works on my phone! That’s the big win as far as I’m concerned. I’ve been hankering for the ability to easily manage my blog form my phone for a long time and now I can.

You can find Net Writer in the Windows Store and learn more about it at Ed’s blog.