But it works on my PC!

The random thoughts of Richard Fennell on technology and software development

My session today at Modern Jago

Thanks  to everyone who came along to the Microsoft event today at Modern Jago. I hope you all found it useful. I got feedback from a few people that my tip on not trusting company WIFI when trying to do remote debugging of Windows RT devices was useful (or any other type of device for that matter).

I have seen too many corporate level Wifi implementation, and a surprising number of home ASDL/Wifi routers, doing isolation between WiFi clients. So each client can see the internet fine, but not any another Wifi devices. My usual solution is as I did today, use a MiFi or phone as a basic Wifi hub, they are both too dumb to try anything as complex as client isolation. Or look on your Wifi hub to check if you can disable client isolation.

More on HDD2 boot problems with my Crucial M4-mSATA

I have been battling my Crucial M4-mSATA 256Gb SDD for a while now. The drive seems OK most of the time, but if for any reason my PC crashes (i.e. a blue screen, which I have found is luckily rare on Windows8) the PC will not start-up giving a ‘HDD2 cannot be found’ error during POST.

I had not had this problem for a few months, so though it was fixed, then BANG yesterday Windows crashed out the blue (I was writing a document in Word whilst listening to music, not exactly a huge load for Core i7) and I hit the start-up problem. Of course I had been working on the document all afternoon and was relying on auto-save, not doing a real Ctrl S save to a remote network drive, so I expected to have lost everything.

A few attempts at a reboot, using tricks that worked in the past got me nowhere. After a bit more digging in forums I found this new process suggested as a ‘fix’ from Crucial

  1. Plug the system into the mains, then start the system you will get the disk not found error, go into the BIOS settings
  2. Leave the PC running, but doing nothing for 20 minutes. As you are in BIOS there will be no activity for the SDD, this gives it a chance to do a self test and sort itself out.
  3. Switch off the system, unplug from the mains and pull the battery out for 30 seconds
  4. Plug the system back in and it hopefully it will restart without error
  5. If not repeat step 1 – 4 until you have had enough.

Well this process got me going, and it does sort of fit with the procedures I had tried before, they all gave the SDD time to self test after a crash. However, I really needed a better fix, this is my main PC it needs to be reliable. So I checked to see if there was any new firmware releases from Crucial, and it seems there is. I had 04MF and now there is 04MH. Version 04MH includes the following changes:

  • Improved robustness in the event of an unexpected power loss. Significantly reduces the incidence of long reboot times after an unexpected power loss.
  • Corrected minor status reporting error during SMART Drive Self Test execution (does not affect SMART attribute data).
  • Streamlined firmware update command for smoother operation in Windows 8.
  • Improved wear leveling algorithms to improve data throughput when foreground wear leveling is required.

So well worth a try it would seem. Only issue is my SSD is bitlockered, was this going to be a problem? It takes ages to remove it and reapply it.

Well I thought I would risk the update without changing bitblocker (as I had now got the important data off the SDD). So I

  1. Downloaded the Windows 8 firmware tool and current release from Crucial.
  2. Ran it, it warned about backups, and BIOS encryption (which had me a bit worried, but what the hell!)
  3. Accepted the license
  4. Selected my SDD and told it to upgrade
  5. And waited……..
  6. And waited…….., the issue is the tool does not really give you much indication you actually hit the update button, and disk activity is also very patchy. Basically the PC looks to have hung.
  7. However, after about 5 minutes the application came back, tried to run again as I had pressed update twice and promptly crashed. However, it had done the upgrade.
  8. I re-ran the tool and it told me the drive was now at 04MH

I rebooted the PC and all seemed OK, but only time will tell.

TF237111 errors when trying to add work items to the backlog after TFS 2012 QU1 is applied

[Updated 4 Feb 2013 See http://blogs.msdn.com/b/bharry/archive/2013/02/01/hotfixes-for-tfs-2012-update-1-tfs-2012-1.aspx for the latest on this ]

I posted earlier in the week about my experiences with the post TFS 2012 QU1 hotfix. When I posted I thought we had all our problems sorted, we did for new team projects, but it seems still had an issue for teams on our team projects that were created prior to the upgraded from RTM to QU1. As I said in the past post we got into this position due to trying to upgraded a TPC form RTM to QU1 by detaching from the 2012 RTM server and attaching to a 2012 QU1 server – this is not the recommended route and caused us to suffer the problem the KB2795609 patch addresses.

The problem we still had was follows:

  • I have two users a Team  Project called ‘BM’ who are in the team called ‘Bad TP’
    • Richard (the Team project creator and administrator)
    • Fred (a Team Project contributor)
  • All is fine for Richard, he can see the team’s product backlog and add items to it.
  • Fred can get to the team backlog page in the web client, but cannot see any work items and gets a TF237111 error if they try to add a new work item

image

  • The quick fix was to make Fred a team project administrator, but not a long term solution
  • We checked the following rights
    • Richard was a member of basically all the groups on the ‘BM’ team project (he was the creator so that was expected), the important ones were [BM\Project administrators, [BM]\contributors and ‘Bad TP’
    • Fred was a member of the [BM]\contributors  and ‘Bad TP’ team

clip_image001[6]

    • The ‘Bad TP’ team had the following permissions

clip_image001

So all these permissions looked OK as you would expect. What I had forgotten was that the team model in TFS 2012 is build around the Area’s hierarchy. This has security permissions too. To check this

  • Go to the Admin page for ‘Bad TP’
  • Click the “Areas” tab
  • Right click the “default area” for the team and select “security”
  • We had expect to see some like this

image

  • However there was no entry at all for the Contributors group.
  • I added this in and had to explicitly set the four ‘inherited allow‘ permissions to ‘allow’ and everything started to work.

So the problem was that during the problematic upgraded we had managed to strip off all the contributor group entries from area in the existing Team Project. The clue was actually in the TF237111 error as this does mention permissions are the area path.

So now we know we can fix the issue. It should be noted that any new teams created in the team project seem to not get this right applied, so we have to remember to added it when we create a new team.

Incorrect IIS IP Bindings and TFS Server Url

By default the TFS server uses http://localhost:8080/tfs as it’s Server URL, this is the URL used for internal communication, whereas the Notification Url is the one TFS tells client to communicate to it via. Both these Urls can be changed via the Team Foundation Server Console, but I find you do not usually need to change the Server Url, only the notification one.

image

I hit a problem recently on a site where if you tried to edit the Team Project Collection Group Membership (via the web or TFS admin console) you got a dialog popping up saying  ‘HTTP 400 error’. Now this you have to say looks like a URL/binding issue, the tools cannot find an end point.

Turns out the issue was that there had been a IP addressing schema changes on the network. The different services on the network had been assigned their own IP addresses (as well as the host having its own IP address) e.g. On our TFS server we might have

  • 10.0.0.1 – physicalservername.domain.com
  • 10.0.1.1 – tfs2012.domain.com
  • 10.0.1.2 – sharepoint.domain.com

This is all well end good, but a mistake had been made in the bindings in IIS during the reconfiguration.

image

The HTTPS bind was correct the hostname matched the IP address, this has to be the case else SSL does not work. However, the HTTP port 8080 should have been bound  to all IP Addresses (i.e. no hostname and the * IP address as above). On the site, HTTP was bound to a specific IP address. This was fine if a client connected to http://tfs2012.domain.com:8080/tfs (which resolved to the correct address), but failed for http://loclahost:8080/tfs  as the binding did not match.

Once the edit was made to remove the hostname all was OK (the other option would have been to alter the server Url to match)

So problem fixed, the strangest thing is that this issue only appeared to effect setting TPC group membership, everything else was fine.

Experiences applying TFS 2012 QU1 and it subsequent hotfix

Brian Harry posted last week about a hotfix for TFS 2012 QU1 (KB2795609). This should not be needed by most people, but as his post points out does fix issues for a few customers. Well we were one of those customers. When upgrading from 2012 RTM to 2012 QU1 we had attempted what with hindsight was an over ambitious hardware migration too. This involved swapping our data tier from a SQL 2012 instance to a new 2012 availability group and merging team project collections from different server as well as applying the QU1. Our migration plan contained some team project collection detach/attach steps hence getting into the area this hotfix addresses.

The end point was we ended up with a QU1 upgraded server, but we could only get users connected if we made them team project administrators, a valid short term solution, but something we needed to fix.

We therefore applied the new KB2795609 patch, but hit a gotcha that you should be aware of

  • We ran the patch EXE on our TFS server that was showing the problem.
  • This ran without error, taking about 5 minutes
  • We tried to connect to the patched TFS server via the web client and VS2012, we could make a connection to TFS but could open any TPCs
  • On checking the TFS admin console we saw the TPC was offline and reporting that the servicing had failed (but this had not been reported back via the patch tool)
  • We reran the servicing job (via the TFS admin console) but it failed in the core step we saw in the logs

[Error] TF400744: An error occurred while executing the following script: TurnOnRCSI.sql. Failed batch starts on the line 1. Statement line: 1. Script line: 1. Error: 5069 ALTER DATABASE statement failed.

  • Our TFS DBs are now stored with a SQL 2012 availability group, during the upgrade to QU1 we had seen problems applying the upgrade unless we removed the DBs from the availability groups. So we removed the tfs_configuration and tfs_[mytpc] from availability groups and re applied the servicing job and all was OK
  • Once the servicing of the TPC was completed it went online as expected.
  • We then put the DBs back into the availability group
  • We could then remove the users from the team project administrators group as their previous rights were working again.

So we now had a patched and working TFS 2012 QU1 server. Lets hope that QU2 is a little smoother and we don’t need the direct help of product group, who I must say have been great in getting this problem addressed. I really like the openness we see in Brian’s blog of both the good and the bad.

Why can’t I create an environment using a running VM on my Lab Management system?

With TFS lab management you can build environments from stored VM and VM templates stored in an SCVMM library or from VMs running on a Hyper-V host within your lab infrastructure. This second form is what used to be called composing an environment in TFS 2010. Recently when I tried to compose an environment I had a problem. After selecting the running VM inside the new environment wizard I got the red star that shows an error in the machine properties

image

Now I would only expect to see this when creating an environment with a VM templates as a red star usually means the OS profile is not set e.g. you have missed a product key, or passwords don’t match. However, this was a running VM so there were no settings I could make, and no obvious way to diagnose the problem. After a few email with Microsoft Lab management team we go to the bottom of the problem, it was all down to the Hyper-V hosts network connections, but that is rushing ahead, first lets see why it was a confusing problem.

First the red herring

We now know the issue was the Hyper-V host network, but at first it looked like I could compose some guest VMs but not others. I wrongly assumed the issue was some bad meta-data or corrupt settings within the VMs. Tthis problem all started after a server crash and so we were fearing corruption, which clouded our thoughts.

The actual reason some VMs could be composed and some could not was dependant on which Hyper-V host they were running on. Not the VMs themselves.

The diagnostic steps

To get to the root of this issue a few commands and tools were used. Don’t think for a second there was not a lot of random jumping about and trial and error. In this post I am just going to point out what was helpful.

Firstly you need to use the TFSConfig command on your TFS server to find out your network location setting. So run

C:\Program Files\Microsoft Team Foundation Server 11.0\Tools>tfsconfig lab /settings /list
SCVMM Server Name: vmm.blackmarble.co.uk
Network Location: VSLM Network Location
IP Block: 192.168.23.0/24
DNS Suffix: blackmarble.co.uk

Next you need to see which, if any, of your Hyper-V hosts are connected to this location. You can do this in a few graphically ways in SCVMM (and I am sure via PowerShell too)

If you select a Hyper-V host in SCVVM, right click and select View networking. On a healthy host you see the VSLM network location connected to external network adaptor the VMs are using

image

On my failing Hyper-V host the VSLM network was connected to an empty network port

image

You can also see this on the SCVMM > host (right click) > properties. If you look on  the networking tab for the main virtual network  you should see the VSLM network as the location. On the failing Hyper-V host this location was empty.

image

The solution

You would naively think selecting the edit option on the screen shot above would allow you to enter the VSLM Network as the location, but no. Not on that tab. You need to select the hardware tab.

image

You can then select the correct network adaptor and override the discovered network location to point to the VSLM Network Location. Once this was done I could compose environments as I would expect.

I have said it before, but Lab Management has a lot of moving parts, and they all must be setup right else nothing works. A small configuration error can seriously ruin your day.

Did I delete the right lab?

It was bound to happen in the end, the wrong environment got deleted on our TFS Lab Management instance. The usual selection of rushing, minor mistakes, misunderstandings and not reading the final dialog properly and BANG you get that sinking feeling as you see the wrong set of VMs being deleted. Well this happened yesterday, so was there anything that can be done? Luckily the answer is yes, if you are quick.

Firstly we knew SCVMM operations are slow, so I RDP’d onto the Hyper-V host  and quickly copied the folders that contained the VMs scheduled to be deleted. We now had a copy of the VHDs.

On the SCVMM host I cancelled the delete jobs. Turns out this did not really help as the jobs just get rescheduled. In fact it may make matters worse as the failing of jobs and their restarting seems to confuse SCVMM, took it hours before it was happy again, kept giving ‘can’t run job as XXX in use’ and losing sight of the Hyper-V hosts (needed to restart the VMM service in the end).

So I now had a copy of three network isolated VM, so I

  • Created new VMs on a Hyper-V host using Hyper-V manager with the saved VHDs as their disks. I then made sure they ran and were not corrupted
  • In SCVMM cleared down the saved state so they were stopped (I forgot to do this the first time I went through this process and it meant I could not deploy the stored VMs into an isolated environment, that wasted hours!)
  • In SCVMM put them into the library on a path our Lab Management server knows about (gotcha here is SCVMM deletes the VM after putting it into the library, this is unlike MTM Lab Center which leaves the original in place, always scares me when I forget)
  • In MTM Lab Center import the new VMs from the library
  • Create a new network isolated environment with the VMs
  • Wait……………………….

When it eventually started I had a network isolated environment back to the state it was when we in effect pulled the power out. All took about 24 hours, but most of this was waiting for copies to and from the library to complete.

So the top tip is try to avoid the problem, this is down to process frankly

  • Use the ‘mark a in use’ feature to say who is using a VM
  • Put a process in place to manage the lab resources. It does not matter how much Hyper-V resource you have you will run out in the end and be unable to add that extra VM. You need a way to delete/archive out what is not currently need
  • Read the confirmation dialogs, they are there for a reason

Why does ‘Send to > email link’ in SharePoint open Chrome on my PC?

I must have clicked something in error on my Win 8 PC as when I open one of our SharePoint 2010 sites and select a file, right clicked and selected Send To > Email Link instead of an Outlook email opening my PC tries to open Chrome.

A bit of quick digging showed the issue was that the file association for mailto: was wrong. You can check this setting in IE > Internet Options > Programs > Internet Programs > set Programs (button)

image

Once I changed this to Outlook I got the behaviour I expected

TF900546, can’t run Windows 8 App Store unit tests in a TFS build

Today has been one of purging build system problems. On my TFS 2012 Windows 8 build box I was was getting the following error when trying to run Windows 8 App Store unit tests

TF900546: An unexpected error occurred while running the RunTests activity: 'Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.'.

On further investigation, I am not really sure anything was working too well on this box. To give a bit of background

  • I have one build controller build2012
  • with a number of build agents spread across various VMs. I use tags to target the correct agent e.g. SUR40 or WIN8

In the case of Windows 8 builds (where the  TFS build agent has to run on a Windows 8 box) the build seemed to run, but tests failed with the TF900546 ‘its broken error, but I am not saying why’ error. As usual there was nothing in the logs to help.

To try to debug the error I added a build controller to this box, and eventually, just like Martin in his post noticed, after far too long, that I was getting a error on the build service on the Windows 8 box and the agent was not fully online.

image

The main symptom is the build agent says ready, but shows a red box (stopped). If you hit the details link that appears you get the error dialog. Martin had a 500 error, I was getting a 404. I had seen similar problems before, I really should read (or at least remember) my own blog posts.

I can’t stress enough, if you don’t see a green icon on build controllers and agent you have a problem, it might not be obvious at that point but it will bite you later!

For me the fix was the URL I was using to connect to the TFS server. i was using HTTPS (SSL), as soon as switched to HTTP all was OK. In this case this was fine as both the TFS server and build box were in the same rack so SSL was not really needed. I suspect that the solution, if I had wanted SSL, would be as Martin outlined, a config file edit to sort out the bindings.

But remember….

That having a working build system is not enough for Windows 8 App Store unit tests. You also have to manually install the application certificate for test assembly as detailed in MSDN as well as getting the build service running in interactive mode.

Once this was done my application build and the tests ran OK

More thoughts on addressing TF900546 ‘Unable to load one or more of the requested types’ on TFS2012

A while ago I posted about seeing the TF900546 error when running unit tests in a previously working TFS 2012 build. The full error being:

TF900546: An unexpected error occurred while running the RunTests activity: 'Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.'.

Well late last week this problem came back with avengeance on a number of builds run on the same build controller/agent(s). Irritatingly I first noticed it after a major refactor of a codebase, so I had plenty of potential root causes as assemblies had been renamed and it was possible they might not be found. However, after a bit of testing there were no obvious candidates as all tests worked fine locally on my development PC, and a new very simple test application showed the same issues. It was defiantly an issue on the build system.

I can still find no good way to debug this error, Stackoverflow mention Fuslogvw and WinDbg, as well as various copy local settings and the like. Again all seems too much as this build was working in the past, just seemed to stop. I tried a couple but got no real information, and the error logs were empty.

In the end I just tried what I did before (as I could think of no better tactic to pin down the true issue). I went into the build controller config, removed the reference to the custom assemblies, saved this settings (causing a controller restart), then put it back (another restart of the controller)

image

After this my test started working again, with no other changes

Interesting a restart of VM running the build controller did not fix the problem. However this does somewhat chime with comments in the StackOverFlow thread that causing the AppPool in MVC apps to rebuild completely, ignoring any cached assemblies, seems to fix the issue.