But it works on my PC!

The random thoughts of Richard Fennell on technology and software development

Our TFS Lab Management Infrastructure

After my session Techorama last week I have been asked some questions over how we built our TFS Lab Management infrastructure. Well here is a bit more detail, thanks to Rik for helping correcting what I had misremembered and providing much of the detail.

image

For SQL we have two physical servers with Intel processors. Each has a pair of mirrored disks for the OS and RAID5 group of disks for data. We use SQL 2012 Enterprise Always On for replication to keep the DBs in sync. The servers are part of a Windows cluster (needed for Always On) and we use a VM to give a third server in the witness role. This is hosted on a production Hyper-V cloud. We have a number of availability groups on this platform, basically one per service we run. This allows us to split the read/write load between the two servers (unless they have failed over to a single box). If we had only one availability group for all the DBs  one node would being all the read/write and the other read only, so not that balanced.

SCVMM runs on a physical server with a pair of hardware-mirrored 2Tb disks for 2Tb of storage. That’s split into two partitions, as you can’t use data de-duplication on the OS volume of Windows. This allows us to have something like 5Tb of Lab VM images stored on the SCVMM library share that’s hosted on the SCVMM server. This share is for lab management use only.

We also have two physical servers that make up a Windows Cluster with a Cluster Shared Volume on an iSCSI SANs. This hosts a number of SCVMM libraries for ISO Images, Production VM Images and test stuff. Data de-duplication again is giving us an 80% space saving on the SAN (ISO images of OS’ and VHDs of installed OS’ dedupe _really_ well)

Our Lab cloud currently has three AMD based servers. They use the same disk setup as the SQL boxes, with a mirrored pair for OS and RAID5 for VM storage.

Our Production Hyper-V also has three servers, but this time in a Windows Cluster using a Cluster Shared Volume on our other iSCSI SAN for VM storage so it can do automated failover of VMs.

Each of the SQL servers, SCVMM servers and Lab Hyper-V servers uses Windows Server 2012 R2 NIC teaming to combine 2 x 1Gbit NICs which gives us better throughput and failover. The lab servers have one team for VM traffic and one team for the hyper-v management that is used when deploying VMs. That means we can push VMs around as fast as the disks will push data in either direction, pretty much, without needed expensive 10Gbit Ethernet.

So I hope that answers any questions.

What I learnt getting Release Management running with a network Isolated environment

In my previous post I described how to get a network isolated environment up and running with Release Management, it is all to do with shadow accounts. Well getting it running is one thing, having a useful release process is another.

For my test environment I needed to get three things deployed and tested

  • A SQL DB deployed via a DACPAC
  • A WCF web service deployed using MSDeploy
  • A web site deployed using MSDeploy

My environment was a four VM network isolated environment running on our TFS Lab Management system.

image 

The roles of the VMs were

  • A domain controller
  • A SQL 2008R2 server  (Release Management deployment agent installed)
  • A VM configured as a generic IIS web server (Release Management deployment agent installed)
  • A VM configured as an SP2010 server (needed in the future, but its presence caused me issues so I will mention it)

Accessing domain shares

The first issue we encountered was that we need the deployment agent on the VMs to be able to access domain shares on our corporate network, not just ones on the local network isolated domain. They need to be able to do this to download the actual deployment media. The easiest way I found to do this was to place a NET USE command at the start of the workflow for each VM I was deploying too. This allowed authentication from the test domain to our corporate domain and hence access for the agent to get the files it needed. The alternatives would have been using more shadow accounts, or cross domain trusts, both things I did not want the hassle of managing.

image

The run command line activity runs  the net command with the arguments use \\store\dropsshare [password] /user:[corpdomain\account]

I needed to use this command on each VM I was running the deployment agent on, so appears twice in this workflow, once for the DB server and once for the web server.

Version of SSDT SQL tools

My SQL instance was SQL 2008R2, when I tried to use the standard Release Management DACPAC Database Deployer tool it failed with assembly load errors. Basically the assemblies downloaded as part of the tool deployment did not match anything on the VM.

My first step was to install the latest SQL 2012 SSDT tools on the SQL VM. This did not help the problem as there was still a mismatch between the assemblies. I therefore create a new tool in the Release Management inventory, this was a copy of the existing DACPAC tool command, but using the current version of the tool assemblies from SSDT 2012

image

Using this version of the tools worked, my DB could be deployed/updated.

Granting Rights for SQL

Using SSDT to deploy a DB (especially if you have the package set to drop the DB) does  not grant any user access rights.

I found the easiest way to grant the rights the web service AppPool accounts needed was to run a SQL script. I did this by creating a component for my release with a small block of SQL to create DB owners, this is the same technique as used for the standard SQL create/drop activities shipped in the box with Release Management.

The arguments I used for the sqlcmd were -S __ServerName__ -b -Q "use __DBname__ ; create user [__username__] for login [__username__];  exec sp_addrolemember 'db_owner', '__username__';"

image

Once I had created this component I could pass the parameters I needed add DB owners.

Creating the web sites

This was straight forward, I just used the standard components to create the required AppPools and the web sites. It is worth nothing that these command can be run against existing site, the don’t error if the site/AppPool already exists. This seems to be the standard model with Release Management as there is no decision (if) branching in the workflow, so all tools have to either work or stop the deployment.

image

I then used the irmsdeploy.exe Release Management component to run the MSDeploy publish on each web site/service

image

A note here: you do need make sure you set the path to the package to be the actual folder the .ZIP file is in, not the parental drop folder (in my case Lab\_PublishedWebsites\SABSTestHarness_Package not Lab)

image

Running some integration tests

We now had a deployment that worked. It pulled the files from our corporate LAN and deployed them into a network isolated lab environment.

I now wanted to run some tests to validate the deployment. I chose to use some SQL based tests that were run via MSTest. These tests had already been added to Microsoft Test Manager (MTM) using TCM, so I thought I had all I needed.

I added the Release Management MTM component to my workflow and set the values taken from MTM for test plan and suite etc.

image

However I quickly hit cross domain authentication issues again. The Release Management component does all this test management via a PowerShell script that runs TCM. This must communicate with TFS, which in my system was in the other domain, so fails.

The answer was to modify the PowerShell script to also pass some login credentials

image

The only change in the PowerShell script was that each time the TCM command is called the /login:$LoginCreds block is added, where $LoginCreds are the credentials passed in the form corpdomain\user,password

$testRunId = & "$tcmExe" run /create /title:"$Title" /login:$LoginCreds /planid:$PlanId /suiteid:$SuiteId /configid:$ConfigId /collection:"$Collection" /teamproject:"$TeamProject" $testEnvironmentParameter $buildDirectoryParameter $buildDefinitionParameter $buildNumberParameter $settingsNameParameter $includeParameter
   

An interesting side note is that if you try to run the TCM command at the command prompt you only need to provide the credentials on the first time it is run, they are cached. This does not seem to be the case inside the Release Management script, TCM is run three times, each time you need to pass the credentials.

Once this was in place, and suitable credentials added to the workflow I expected my test to run. They did but 50% failed – Why?

It runs out the issue was that in my Lab Management environment setup I had set the roles of both IIS server and SharePoint server to Web Server.

My automated test plan in MTM was set to run automated tests on the Web Server role, so sent 50% of the tests to each of the available servers. The tests were run by Lab Agent (not the deployment agent) which was running as the Network Service machine accounts e.g. Proj\ProjIIS75$ and Proj\ProjSp2010$. Only for former of these had been granted access to the SQL DB (it was the account being used for the AppPool), hence half the test failed, with DB access issues

I had two options here, grant both machine accounts access, or alter my Lab Environment. I chose the latter. I put the two boxes in different roles

image

I then had to load the test plan in MTM so it was updated with the changes

image

Once this was done my tests then ran as expected.

Summary

So I now have a Release Management deployment plan that works for a network isolated environment. I can run integration tests, and will soon add some CodeUI ones, it is should only be a case of editing the test plan.

It is an interesting question of how well Release Management, in its current form, works with Lab Management when it is SCVMM/Network Isolated environment based, is is certainly not its primary use case, but it can be done as this post shows. It certainly provides more options than the TFS Lab Management build template we used to use, and does provide an easy way to extend the process to manage deployment to production.

Getting started with Release Management with network isolated Lab Management environments

Our testing environments are based on TFS Lab Management, historically we have managed deployment into them manually (or at least via a PowerShell script run manually) or using TFS Build. I thought it time I at least tried to move over to Release Management.

The process to install the components of Release Management is fairly straight forward, there are wizards that ask little other than which account to run as

  • Install the deployment server, pointing at a SQL instance
  • Install the management client, pointing at the deployment server
  • Install the deployment agent on each box you wish to deploy to, again pointing it as the deployment server

I hit a problem with the third step. Our lab environments are usually network isolated, hence each can potentially be running their own copies of the same domain. This means the connection from the deployment agent to the deployment server is cross domain. We don’t want to setup cross domain trusts as

  1. we don’t want cross domain trusts, they are a pain to manage
  2. as we have multiple copies of environments there are more than one copy of some domains – all very confusing for cross domain trusts

So this means you have to use shadow accounts, as detailed in MSDN for Release Management, the key with this process is to make sure you manually add the accounts in Release Management (step 2) – I missed this at first as it differs from what you usually need to do.

To resolve this issue, add the Service User accounts in Release Management. To do this, follow these steps:

    1. Create a Service User account for each deployment agent in Release Management. For example, create the following:

      Server1\LocalAccount1
      Server2\LocalAccount1
      Server3\LocalAccount1

    2. Create an account in Release Management, and then assign to that account the Service User and Release Manager user rights. For example, create Release_Management_server\LocalAccount1.
    3. Run the deployment agent configuration on each deployment computer.

However I still had a problem, I entered the correct details in the deployment  configuration client, but got the error

image

The logs showed

Received Exception : Microsoft.TeamFoundation.Release.CommonConfiguration.ConfigurationException: Failed to validate Release Management Server for Team Foundation Server 2013.
   at Microsoft.TeamFoundation.Release.CommonConfiguration.DeployerConfigurationManager.ValidateServerUrl()
   at Microsoft.TeamFoundation.Release.CommonConfiguration.DeployerConfigurationManager.ValidateAndConfigure(DeployerConfigUpdatePack updatePack, DelegateStatusUpdate statusListener)
   at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)

A quick look using WireShark showed it was try to access http://releasemgmt.mydomain.com:1000/configurationservice.asmx,if I tried to access this in a browser it showed

 

Getting

Server Error in '/' Application.
--------------------------------------------------------------------------------

Request format is unrecognized.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Web.HttpException: Request format is unrecognized.

Turns out the issue was I needed to be run the deployment client configuration tool as the shadow user account, not as any other local administrator.

Once I did this the configuration worked and the management console could scan for the client. So now I can really start to play…

Fix for intermittent connection problem in lab management – restart the test controller

Just had a problem with a TFS 2012 Lab Management deployment build. It was working this morning, deploying two web sites via MSDeploy and a DB via a DacPac, then running some CodedUI tests. However, when I tried a new deployment this afternoon it kept failing with the error:

The deployment task was aborted because there was a connection failure between the test controller and the test agent.

image

If you watched the build deployment via MTM you could see it start OK, then the agent went off line after a few seconds.

Turns out the solution was the old favourite, to reboot of the Build Controller. Would like to know why it was giving this intermittent problem though.

Update 14th Jan An alternative solution to rebooting is to add a hosts file entry on the VM running the test agent for the IP address of the test controller. Seems the problem is name resolution, but not sure why it occurs

When your TFS Lab test agents can’t start check the DNS

Lab Management has a lot of moving parts, especially if you are using SCVMM based environments. All the parts have to communicate if the system is work.

One of the most common problem I have seen are due to DNS issues. A slowly propagating DNS can cause chaos as the test controller will not be able to resolve the name of the dynamically registered lab VMs.

The best fix is to sort out your DNS issues, but that is not always possible (some things just take the time they take, especially on large WANs).

An immediate fix is to use the local host files on the test controller to define IP address for the lab[guid].corp.domain names created when using network isolation. Once this is done the handshake between the controller and agent is usually possible.

If it isn’t then you are back to all the usually diagnostics tools

Building VMs for use in TFS Lab Management environments

We have recently gone through an exercise to provide a consistent set of prebuilt configured VMs for use in our TFS Lab Management environments.

This is not an insignificant piece of work as this post by Rik Hepworth discusses detailing all the IT pro work he had to do to create them. This is all before we even think about the work required to create deployment TFS builds and the like.

It is well worth a read if you are planning to provision a library of VM for Lab Management as it has some really useful tips and tricks

Why is my TFS Lab build picking a really old build to deploy?

I was working on a TFS lab deployment today and to speed up testing I set it to pick the <Latest> build of the TFS build that actually builds the code, as opposed to queuing a new one each time. It took me ages to remember/realise why it kept trying to deploy some ancient build that had long since been deleted from my test system.

image

The reason was the <Latest> build means the last successful build, not the last partially successful build. I had a problem with my code build that means a test was failing (a custom build activity on my build agent was the root cause if the issue). Once I fixed my build box so the code build did not fail the lab build was able to pickup the files I expected and deploy them

Note that if you tell the lab deploy build to queue it’s own build it all attempt to deploy this even if it is partially successful.

Fix for ‘Cannot install test agent on these machines because another environment is being created using the same machines’

I recently posted about adding extra VMs to existing environments in Lab Management. Whilst following this process I hit a problem, I could not create my new environment there was a problem at the verify stage. It was fine adding the new VMs, but the one I was reusing gave the error ‘Microsoft test manager cannot install test agent on these machines because another environment is being created using the same machines’

image

II had seen this issue before and so I tried a variety of things that had sorted it in the past, removing the TFS Agent on the VM, manually installing and trying to configure them, reading through the Test Controller logs, all to no effect. I eventually got a solution today with the help of Microsoft.

The answer was to do the following on the VM showing the problem

  1. Kill TestAgentInstaller.exe process, if running on failing machine
  2. Delete “TestAgentInstaller” service from services, using sc delete testagentinstaller command (gotcha here, use a DOS style command prompt not PowerShell as sc has a different default meaning in PowerShell, it is an alias for set-content. if using PowerShell you need the full path to the sc.exe)
  3. Delete c:\Windows\VSTLM_RES folder
  4. Restart machine and then try Lab Environment creation again and all should be OK
  5. As usual once the environment is created you might need to restart it to get all the test agents to link up to the controller OK

So it seems that the removal of the VM from its old environment left some debris that was confusing the verify step. Seems this only happens rarely but can be a bit of a show stopper if you can’t get around it