BM-Bloggers

The blogs of Black Marble staff

Lab and SCVMM – They go together like a horse and carriage

Hello welcome to my blog, this being my first post. I’m a Junior developer in test here at Black Marble and my day to day job roles are starting encompass dev-ops as well as testing.

So there’s this one project we’ve been working on which I have been working to automate using end to end Build Deploy with Lab Management. That exercise is coming up in another Blog post as I’m still working on it.

What I wanted to share with you all today is a problem I’ve encountered (and solved) while trying to accomplish that project.

 

Lab Management

For those not in the know Microsoft Lab Management is a piece of Test Environment management software, most notably for managing VM’s. We use it at Black Marble for provisioning Dev and Test environments for our projects and for the most part it’s fairly straight forward for me or Tom Barnes (my counterpart in test) to role out a new environment to order for a given project once IT have built the template/stored environment we’re using for that project.

In my Auto Deploy exercise I was doing some really heavy moving around of environment Snapshots on the environment I was deploying to.

Basically the 4 box environment (consisting of a SharePoint 2013 Server, CRM 2011 Server, SQL 2012 server and a DC) had at any one time 1 snapshot. A sterile point in time prior to me attempting to deploy on top of it. If my auto deploy didn’t work due to an environment setting I went and changed the setting, created a new snapshot and deleted the old one.

Lo and behold today I try to revert to my newest clean snapshot and Lab Manager tells me

A) It couldn’t apply the Snapshot

B) The snapshot tree was corrupted and that I needed to create a new tree.

My immediate response was to run crying to Rik Hepworth, our resident witch doctor for all things SCVMM. As this had happened before, and he’d fixed it last time believing that it was just a corrupted snapshot. We found the problem this time was far more….odd.

We opened the System Centre Virtual Machine Manager Console, logged onto the SCVMM server and examined my environment (now in a mixed state), and the reason for the snapshot tree being utterly rodded was fairly evident.

 

The Truth Comes Out

Lab Manager lies! To itself and to the user.

My SharePoint VM had 4 snapshots

My CRM VM had 3 snapshots

MY DC had 1 snapshots

MY SQL server had 1 snapshots

……On top of the one I’d just created prior into getting into this mess.

But wait hadn’t I been deleting the old snapshots as I no longer needed them? Well yes but that didn’t actually mean Lab Manager went and did it.

Lab Manager perceived that it only had one snapshot for the whole environment, which was not the truth by a country mile. What had happened repeatedly it seems is that the deletion jobs I had triggered in Lab Manager had failed occasionally within SCVMM but Lab Manager didn’t know this because it doesn’t talk to SCVMM all that well. Lab Fires and Forgets commands at SCVMM. Near as we can figure it keeps it’s own version of the truth regarding it’s machine snapshots and holds it’s hands over it’s ears shouting la la la when the real world doesn’t match up with what it’s expecting.

All well and good to know for future but what about my poor environment I’d been slaving over for 3 weeks?

All was not lost

Fixing It

My environment was not beyond recovery, in SCVMM we paused everything else running on the same tin as my environment. This was to minimize the risk of jobs timing out mid-process. Something we’ve had problems with previously

For each VM we opened it’s properties and removed each of the previous checkpoints until we had the latest snapshot only. (As shown below)

image

 

Update: This happened again to me today, and it seems the problem can be vary in severity, I had to delete the ENTIRE snapshot tree, not just the latest snapshot. Until I had done this all new SCVMM jobs to apply new snapshots to the environment en masse failed (I had to do them one by one), and any new snapshots I made had suffered the same problem. Manually apply a safe snapshot to each machine in your environment and then is possible (for in my experience most reliable results) delete the entire tree and then snapshot it again to create a new one.

We then individually applied this snapshot to each of the VM’s in the environment one by one (rather than Lab Managers Shot Gun approach of doing them all at once which depending on how much resource your tin has can behave very erratically).

Once the VM’s were back at this checkpoint we deleted the old checkpoint and created a new one (because we’re paranoid that the old one was inherently shifty)

We then fired them back up and presto my dead environment was resurrected via SCVMM necromancy. It was even in the right state by the time all the difference disks had been merged.

Bonus.

In Summary


Be careful with heavy use of snapshots in Lab Management, the larger the environment the more likely things will go wrong. Lab Manager will not tell you when a snapshot deletion has failed, it really is a fire and forget process as far as it’s concerned.

If you get a scary error when your environment is in a mixed state after attempting to apply a snapshot then fear not, so long as you have a witch doctor on hand your environment can be recovered to the state you originally intended.

If you are entertaining curiosity about Lab Management and SCVMM I strongly recommend you look at the following blogs.

Check back in the next week for my completed lab deploy war story, and thanks for reading!

Fix for ‘Cannot install test agent on these machines because another environment is being created using the same machines’

I recently posted about adding extra VMs to existing environments in Lab Management. Whilst following this process I hit a problem, I could not create my new environment there was a problem at the verify stage. It was fine adding the new VMs, but the one I was reusing gave the error ‘Microsoft test manager cannot install test agent on these machines because another environment is being created using the same machines’

image

II had seen this issue before and so I tried a variety of things that had sorted it in the past, removing the TFS Agent on the VM, manually installing and trying to configure them, reading through the Test Controller logs, all to no effect. I eventually got a solution today with the help of Microsoft.

The answer was to do the following on the VM showing the problem

  1. Kill TestAgentInstaller.exe process, if running on failing machine
  2. Delete “TestAgentInstaller” service from services, using sc delete testagentinstaller command (gotcha here, use a DOS style command prompt not PowerShell as sc has a different default meaning in PowerShell, it is an alias for set-content. if using PowerShell you need the full path to the sc.exe)
  3. Delete c:\Windows\VSTLM_RES folder
  4. Restart machine and then try Lab Environment creation again and all should be OK
  5. As usual once the environment is created you might need to restart it to get all the test agents to link up to the controller OK

So it seems that the removal of the VM from its old environment left some debris that was confusing the verify step. Seems this only happens rarely but can be a bit of a show stopper if you can’t get around it

Making the drops location for a TFS build match the assembly version number

A couple of years ago I wrote about using the TFSVersion build activity to try to sync the assembly and build number. I did not want to see build names/drop location in the format 'BuildCustomisation_20110927.17’, I wanted to see the version number in the build something like  'BuildCustomisation 4.5.269.17'. The problem as I outlined in that post was that by fiddling with the BuildNumberFormat you could easily cause an error where duplicated drop folder names were generated, such as

TF42064: The build number 'BuildCustomisation_20110927.17 (4.5.269.17)' already exists for build definition '\MSF Agile\BuildCustomisation'.

I had put this problem aside, thinking there was no way around the issue, until I was recently reviewing the new ALM Rangers ‘Test Infrastructure Guidance’. This had a solution to the problem included in the first hands on lab. The trick is that you need to use the TFSVersion community extension twice in you build.

  • You use it as normal to set the version of your assemblies after you have got the files into the build workspace, just as the wiki documentation shows
  • But you also call it in ‘get mode’ at the start of the build process prior to calling the ‘Update Build Number ‘ activity. The core issue being you cannot call ‘Update Build Number’ more than once else you tend to see the TF42064 issues. By using it in this manner you will set the BuildNumberFomat to the actual version number you want, which will be used for the drops folder and any assembly versioning.

So what do you need to do?

  1. Open you process template for editing (see the custom build activities documentation if you don’t know how to do this)
  2. Find the sequence ‘ Update Build Number for Triggered Builds’ and at the top of the process template

    image
    • Add TFSVersion activity – I called mine ‘Generate Version number for drop’
    • Add an Assign activity – I called mine ‘Set new BuildNumberFormat’
    • Add a WriteBuildMessage activity – This is option but I do like to see what it generated
  3. Add a string variable GeneratedBuildNumber with the scope of ‘Update Build Number for Triggered Builds’

    image
  4. The properties for the TFSVersion activity should be set as shown below

    image
    • The Action is the key setting, this needs to be set to GetVersion, we only need to generate a version number not set any file versions
    • You need to set the Major, Minor and StartDate settings to match the other copy of the activity in your build process. I good tip is to just cut and paste from the other instance to create this one, so that the bulk of the properties are correct
    • The Version needs to be set to you variable GeneratedBuildNumber this is the outputed version value
  5. The properties for the Assign activities are as follows

    image
    • Set To to BuildNumberFormat
    • Set Value to String.Format("$(BuildDefinitionName)_{0}", GeneratedBuildNumber), you can vary this format to meet your own needs [updated 31 Jul 13 - better to use an _ rarther than a space as this will be used in the drop path)
  6. I also added a WriteMessage activity that outputs the generated build value, but that is optional

Once all this was done and saved back to TFS it could be used for a build. You now see that the build name, and drops location is in the form

[Build name] [Major].[Minor].[Days since start date].[TFS build number]

image

This is a slight change from what I previously attempted where the 4th block was the count of builds of a given type on a day, now it is the unique TFS generate build number, the number shown before the build name is generated. I am happy with that. My key aim is reached that the drops location contains the product version number so it is easy to relate a build to a given version without digging into the build reports.

I can never remember the command line to add use to the TFS Service Accounts group

I keep forgetting when you use TFS Integration Platform that the user who the tool (or service account is running as a service) is running as has to be in the “Team Foundation Service Accounts” group on the TFS servers involved. If they are not you get a runtime conflict something like

Microsoft.TeamFoundation.Migration.Tfs2010WitAdapter.PermissionException: TFS WIT bypass-rule submission is enabled. However, the migration service account 'Richard Fennell' is not in the Service Accounts Group on server 'http://tfsserver:8080/tfs'.

The easiest way to do this is to use the TFSSecurity command line tool on the TFS server. Now you will find some older blog posts about setting the user as a TFS admin console user to get the same effect, but this only seems to work on TFS 2010. This command is good for all versions

C:\Program Files\Microsoft Team Foundation Server 12.0\tools> .\TFSSecurity.exe /g+ "Team Foundation Service Accounts
" n:mydomain\richard /server:http://localhost:8080/tfs

and expect to see

Microsoft (R) TFSSecurity - Team Foundation Server Security Tool
Copyright (c) Microsoft Corporation.  All rights reserved.

The target Team Foundation Server is http://localhost:8080/tfs.
Resolving identity "Team Foundation Service Accounts"...
s [A] [TEAM FOUNDATION]\Team Foundation Service Accounts
Resolving identity "n:mydomain\richard"...
  [U] mydomain\Richard
Adding Richard to [TEAM FOUNDATION]\Team Foundation Service Accounts...
Verifying...

SID: S-1-9-1551374245-1204400969-2333986413-2179408616-0-0-0-0-2

DN:

Identity type: Team Foundation Server application group
   Group type: ServiceApplicationGroup
Project scope: Server scope
Display name: [TEAM FOUNDATION]\Team Foundation Service Accounts
  Description: Members of this group have service-level permissions for the Team Foundation Application Instance. For se
rvice accounts only.

1 member(s):
  [U] mydomain\Richard

Member of 2 group(s):
e [A] [TEAM FOUNDATION]\Team Foundation Valid Users
s [A] [DefaultCollection]\Project Collection Service Accounts

Done.

Once this is done, and the integration platform run restarted all should be OK

An attempted return for ‘Brian the build bunny’

Background

Back in 2008 Martin Woodward did a post on using a Nabaztag as a build monitor for TFS, ‘Brian the build bunny’. I did a bit more work on this idea and wired into our internal build monitoring system. We ended up with a system where a build definition could be tagged so that it’s success or failure caused the Nabaztag to say a message.

image

This all worked well until the company that made Nabaztag went out of business, the problem being all communication with your rabbit was via their web servers. At the time we did nothing about this, so just stopped using this feature of our build monitors.

Getting it going again

When the company that made Nabaztag went out of business a few replacements for their servers appeared. I choose to look at the PHP based one OpenNab, my longer plan being to use a Raspberry PI as a ‘backpack’ server for the Nabaztag.

Setting up your Apache/PHP server

I decided to start with a Ubuntu 12.04 LT VM to check out the PHP based server, it was easier to fiddle with whilst travelling as I did not want to carry around all the hardware.

Firstly I installed Apache 2 and PHP 5, using the command

sudo apt-get install apache2
sudo apt-get install php5
sudo apt-get install libapache2-mod-php5
sudo /etc/init.d/apache2 restart

I then downloaded the OpenNab files and unzipped them into /var/www/vl

Next I tried started to work through the instructions on the http://localhost/vl/check_install.html I instantly got problems.

The first test is to check is that if you ask for a page that does not exist (a 404 error) it should redirect to the bc.php page. The need for this is that the Nabaztag will make a call to bc.jsp, this cannot be altered so we need to redirect the call. The problem is this is mean to be handled by a .htaccess file in the /var/www/vl folder that contains

ErrorDocument 404 /vl/bc.php

I could not get this to work. In the end I edited the Apache /etc/apache2/httpd.conf  and put the same text in this file. I am not expert on Apache but the notes I read seemed to infer that httpd.conf was being favoured over .htaccess, so it might be a version issue.

Once this change was made I got the expected redirections, asking for an invalid folder or page caused the bc.php file to be loaded (it showing a special 404 message – watch out for this the text in the message it is important, I had thought mine was working before as I saw 404, but it was Apache reporting the error not the bc.php page)

Next I checked the http://localhost/vl/tests to run all the PHP tests. Most passed but I did see a couple of failures and loads of exceptions. The fixes were

  • Failure of 'testFileGetContents' – this is down to whether Apache returns compressed content or not. You need to disable this feature by running the command

sudo a2dismod deflate

  • All the exceptions are because deprecated calls are being made (OpenNab is a few years old). I edited the /etc/php5/apache2/php.ini  file and set the error reporting to not show deprecation warnings. Once this was done the PHP tests all passed

error_reporting = E_ALL & ~E_NOTICE & ~E_DEPRECATED

Next I could try a call to a dummy address and see that files ‘burrows’ folder and I got some gibberish message returned. This proved the redirect worked and I had all the tools wired up

Note: Some people have had permission problems, you might need to grant to write permissions to folder as temporary files are created, but this was not something I needed to alter.

It is a good idea to make sure you have not firewall issues by accessing the test pages from another PC/VM

Getting the Rabbit on the LAN

Note: I needed to know the IP address of my PHP server. Usually you would use DNS and maybe DHCP leases to manage this. For my test I just hard coded it. The main reason was that Ubuntu cannot use Wifi based DHCP on a Hyper-V

The first thing to note is that the Nabaztag Version 2 I am using does not support WPA2 for WIFI security. It only supports WPA, so I had to dig out a base station for it to use as I did not want to downgrade my WIFI security.

Note: The Nabaztag Version 1 only does WEP, if you have one of them you need to set your security appropriately.

To setup the Nabaztag

  • Hold down the button on the top and switch it on, the nose should go purple
  • On a PC look for new WIFI base stations, connect to the one with a name like Nabaztag1D
  • In a browser connect to 192.168.0.1 and set the Nabaztag to connect to your WIFI base station

image

  • In the advance options, you also need to set the IP address or DNS name of your new OpenNab server

image

  • When you save the unit should reboot
  • Look to see that a new entry in the /vl/burrows folder on your OpenNab server

So at this point I thought it was working, but the Nabaztag kept rebooting, I saw three tummy LEDs go green but the nose was flashing orange/green then a reboot.

After much fiddling I think I worked out the problem. The OpenNab software is a proxy. It still, by default, calls to the old Nabaztag site. Part of the boot process is to pull a bootcode.bin file down from the server to allow the unit to boot. This was failing.

To fix this I did the following

  • Edited the /vl/opennab.ini file
    • Set the LogLevel = 4 so I got as much logging as possible in the /vl/logs folder
    • Set the ServerMode = standalone so that it does not try to talk to the original Nabaztag  site
    • I saw that the entry BootCode = /vl/plugin/saveboot/files/bootcode.bin, a file I did not have. The only place I could find a copy was on the volk Nabaztag tools site
  • Once all these changes were made my Nabaztag booted OK, I got four green LEDs, the ears rotated

 

When you power up the Nabaztag, it runs through a start-up sequence with orange and green lights. Use this to check where there's a problem:

  • First belly light is the connection to your network - green is good
  • Second belly light is that the bunny has got an IP address on your network - green is good
  • Third belly light means that the bunny can resolve the server web address - green is good
  • The nose light confirms whether the server is responding to the rabbit's requests - green is good

A pulsing purple light underneath the rabbit means that the Nabaztag is connected and working OK.

Sending Messages to the Rabbit on the LAN

Now I could try sending messages via the API demo pages. The messages seemed to be sent OK, but nothing happened on the Rabbit. I was unsure of it had booted OK or even if the bootcode.bin file was correct.

At this point I got to thinking, the main reason I wanted this working again was the text to speech (TTS) system. This is not part of the OpenNab server, this function is passed off to the original Nabaztag service. So was all this work going to get me what I wanted?

I was at the point I had learnt loads about getting Apache, PHP and OpenNab going but was frankly I was not nearer what I was after.

A Practical Solution

At this point I went back to look at the other replacement servers. I decided to give Nabaztaglives.com a go, and it just worked. Just follow their setup page. They provide TTS using the Google API, so just what I needed.

Ok it is not a Raspberry PI backpack, not a completely standalone solution,  but I do have the option to use the Nabaztag in the same manner as I used to as a means to signal build problems

Adding another VM to a running Lab Management environment

If you are using network isolated environment in TFS Lab management there is no way to add another VM unless you rebuild and redeploy the environment. However, if you are not network isolated you can at least avoid the redeploy issues to a degree.

I had a SCVMM based environment that was a not network isolated environment that contained a single non-domain joined server. This was used to host a backend simulation service for a project. In the next phase of the project we need to test accessing this service via RDP/Terminal Server so I wanted to add a VM to act in this role to the environment.

So firstly I deleted the environment in MTM, as the VMs in the environment are not network isolated they are not removed. The only change is to remove the XML meta data from the properties description.

I now needed to create my new VM. I had thought I could create a new environment adding the existing deployed and running VM as well as  a new one from the SCVMM library. However you get the error ‘ cannot create an environment consisting of both running and stored VMs’

image

So here you have two options.

  1. Store the running VM in the library and redeploy
  2. Deploy out, via SCVMM, a new VM from some template or stored VM

Once this is done you can create the new environment using the running VMs or stored images depending on the option chosen in the previous step.

So not any huge saving in time or effort. Just wish there was a way to edit deployed environments

Experiences with a Kindle Paperwhite

I wrote a post a while ago about ‘should I buy a Kindle’, well I put if off for over a year using the Kindle app on my WP7 phone, reading best part of 50 books and been happy enough without buying an actual Kindle. The key issue being poor battery life, but that’s phones for you.

However, I have eventually got around to getting a Kindle device. They key was I had been waiting for something that used touch, had no keyboard,  but most importantly worked in the dark without an external light. This is because I found one of the most useful features of the phone app was reading in bed without the need for a light.

This is basically the spec of the Kindle Paperwhite, so I had no excuse to delay any longer.

Kindle Paperwhite e-reader

 

This week was my first trip away with it and it was interesting to see my usage pattern. On the train and in the hotel I used the Kindle, but standing on the railway station or generally  waiting around I still pulled out my phone to read. This had the effect that I did have to put my phone into WIFI hotspot mode so the Kindle could sync up my last read point via whispersync when I wanted to switch back to the Kindle. This was because I had not bought the 3G version of the Paperwhite, and I still don’t think I would bother to get, as firing up a hotspot is easy if I am on the road and the Kindle uses my home and work WIFI most of the time.

So I have had it for a few weeks now and must say I am very happy with it, I can heartily recommend it. I still have reservations over having to carry another device, but it is so much more pleasant to read on the Kindle screen. So most of the time it is worth carrying it and for when it is not I just use my phone.

Thoughts on Prism for the Windows Runtime

It was good to see that an updated framework for Prism that works with WinRT available. We've used it now on a couple of Win8 app's and it has worked reasonable well.

It's focus is on rapid app delivery and it has useful features to cater for some Win8 specific concerns; such as assembling Flyouts and handling SessionState. These are useful features that save a modicum of time when starting a new App.

Its major flaw (in my opinion) is it lacks support for other .Net variants. I was hoping to see more abstractions being moved into a Portable Class library; that could allow for code re-use across classic .Net, Windows 8 Apps and Windows Phone App. Features can only be used within the realms of a Windows 8 app (with the exception of the EventAggregator.)

This also has a knock-on effect with unit testing.

Unit testing a Win 8 is severely limited. There are challenges with build integration and a lack of a decent mocking framework. A common technique is to architect the App in such a way that the UI logic (ViewModel's) live in a separate PCL assembly. The logic can then be tested using classic .Net assemblies to take advantage of all the great Mocking frameworks available.

Unfortunately, the Prism downloaded from Codeplex targets .Net for WinRT and makes this technique impossible.

Its possible to modify the Prism framework to cater for this unit testing technique. I done so on a recent project and had good success…However its a shame it didn’t come out of the box this way.

Technorati Tags: ,

ResourceMap Not Found Exception message

I blogged previously about the difference when accessing Resources within a .net for Windows Store app as compared with a classic .Net application.

Recently I included a 3rd party assembly into my app and received the exception:

‘ResourceMap Not Found’

This indicates that a request for a particular resource has failed because it cannot find it. It would be nice if the error message told you which resource it was unable to find. However, In my case it was due to the 3rd party library using String resources that it could not find.

To debug this I took a look at the resource file that the app was using.   This is found in the \bin\debug\ folder of the app and is named ‘resources.pri’. This file is a binary and you cannot read it directly. Instead you must use the command line tool ‘makepri.exe’  to dump a human readable version of the file.

  1. Open developer command line
  2. Navigate to the ‘~\project\bin\debug\’ folder
  3. Run the command ‘makepri.exe dump’


This will output an xml version of the resources used by an application – including any that were associated with my 3rd party assembly. 

Effectively all resources (including 3rd party) should be merged into this one file when compiling. However to do the merge the 3rd party resources need to be available; and located relative to the 3rd party assembly (or in the bin folder). if the compiler does not find them then they do not get merged into the resources.pri and your app breaks at run-time.

  • So check that you have the .pri files for 3rd party assemblies and that you have put on disk alongside the referenced assembly (.dll).  The .pri file should be the same name as the assembly. So for instance, if you reference an assembly called ‘Prism.dll’ then you should have a ‘Prism.pri’ in the same folder as it.

 

Technorati Tags: