But it works on my PC!

The random thoughts of Richard Fennell on technology and software development

Running Test Suites within a network Isolated Lab Management environment when using TFS vNext build and release tooling

Updated 27 Sep 2016: Added solutions to known issues

Background

As I have posted many times we make use of TFS Lab Management to provide network isolated dev/test environments. Going forward I see us moving to Azure Dev Labs and/or Azure Stack with ARM templates, but that isn’t going to help me today, especially when I have already made the investment in setting up a Lab Management environments and they are ready to use.

One change we are making now is a move from the old TFS Release Management (2013 generation) to the new VSTS and TFS 2015.2 vNext Release tools. This means I need to be able to trigger automated tests on VMs within Lab Management network isolated environments with a command inside my new build/release process. I have posted on how to do this with the older generation Release Management tools, turns out it is in some ways a little simpler with the newer tooling, no need to fiddle with shadow accounts etal.

My Setup

image

Constraints

The constraints are these

  • I need to be able to trigger tests on the Client VM in the network isolated lab environment. These tests are all defined in automated test suites within Microsoft Test Manager.
  • The network isolated lab already has a TFS Test Agent deployed on all the VMs in the environment linked back to the TFS Test Controller on my corporate domain, these agents are automatically installed and managed, and are handling the ‘magic’ for the network isolation – we can’t fiddle with these without breaking the Labs 
  • The new build/release tools assume that you will auto deploy a 2015 generation Test Agent via a build task as part of the build/release process. This is a new test agent install, so removed any already installed Test Agent – we don’t want this as it breaks the existing agent/network isolation.
  • So my only options to trigger the tests by using TCM (as we did in the past) from some machine in the system. In the past (with the old tools) this had to be within the isolated network environment due to the limitation put in place by the use of shadow accounts.  
  • However, TCM (as shipped with VS 2015) does not ‘understand’ vNext builds, so it can’t seem to find them by definition name/number – we have to find builds by their drop location, and I think this needs to be a UNC share, not a drop back onto the TFS server. So using TCM.EXE (and any wrapper scripts) probably is not going to deliver what I want i.e. the test run associated with a vNext build and/or release.
  • My Solution

    The solution I adopted was to write a PowerShell script that performs the same function as the TCMEXEC.PS1 script that used to be run within the network isolated Labe Environment by the older Release Management products.

    The difference is the old script shelled out to run TCM.EXE, my new version makes calls to the new TFS REST API (and unfortunately also to the older C# API as some features notably those for Lab Management services are not exposed via REST). This script can be run from anywhere, I chose to run it on the TFS vNext build agent, as this is easiest and this machine already had Visual Studio installed so had the TFS C# API available.

    You can find this script on my VSTSPowerShell GitHub Repo.

    The usage of the script is

    TCMReplacement.ps1
          -Collectionuri http://tfsserver.domain.com:8080/tfs/defaultcollection/
    -Teamproject "My Project"
    -testplanname "My test plan" 
    -testsuitename "Automated tests"
    -configurationname "Windows 8"
    -buildid  12345
       -environmentName "Lab V.2.0" 
    -testsettingsname "Test Setting"
    -testrunname "Smoke Tests"
    -testcontroller "mytestcontroller.domain.com"
    -releaseUri "vstfs:///ReleaseManagement/Release/167"
    -releaseenvironmenturi "vstfs:///ReleaseManagement/Environment/247"

    Note

  • The last two parameters are optional, all the others are required. If the last two are not used the test results will not be associated with a release
  • The is also a pollinginterval parameter which default to 10 seconds. The script starts a test run then polls on this interval to see if it has completed.
  • If there are any failed test then the script writes to write-error as the TFS build process sees this is a failed step
  • In some ways I think this script is an improvement over the TCMEXEC script, the old one needed you to know the IDs for many of the settings (loads of poking around in Microsoft Test Manager to find them), I allow the common names of settings to be passed in which I then use to lookup the required values via the APIs (this is where I needed to use the older C# API as I could not find a way to get the Configuration ID, Environment ID or Test Settings ID via REST).

    There is nothing stopping you running this script from the command line, but I think it is more likely to make it part of release pipeline using the PowerShell on local machine task in the build system. When used this way you can get many of the parameters from environment variables. So the command arguments become something like the following (and of course you can make all the string values build variables too if you want)

     

       -Collectionuri $(SYSTEM.TEAMFOUNDATIONCOLLECTIONURI) 
    -Teamproject $(SYSTEM.TEAMPROJECT)
    -testplanname "My test plan"
       -testsuitename "Automated tests"
    -configurationname "Windows 8"
    -buildid  $(BUILD.BUILDID)
      -environmentName "Lab V.2.0"
       -testsettingsname "Test Settings"
    -testrunname "Smoke Tests"
    -testcontroller "mytestcontroller.domain.com"
    -releaseUri $(RELEASE.RELEASEURI)
    -releaseenvironmenturi $(RELEASE.ENVIRONMENTURI)

     

    Obviously this script is potentially a good candidate for a TFS build/release task, but as per my usual practice I will make sure I am happy with it’s operation before wrappering it up into an extension.

    Known Issues

  • If you run the script from the command line targeting a completed build and release the tests run and are shown in the release report as well as on the test tab as we would expect.

    image

    However, if you trigger the test run from within a release pipeline, the test runs OK and you can see the results in the test tab (and MTM), but they are not associated within the release. My guess is because the release had not completed when the data update is made. I am investigating this to try to address the issue.
  • Previously I reported there was a known issue that the test results were associated with the build, but not the release. It turns out this was due to the AD account the build/release agent was running as was missing rights on the TFS server. To fix the problem I made sure the account as configured as follows”":

    Once this was done all the test results appeared where they should

    So hopefully you will find this a useful tool if you are using network isolated environments and TFS build

    Our TFS Lab Management Infrastructure

    After my session Techorama last week I have been asked some questions over how we built our TFS Lab Management infrastructure. Well here is a bit more detail, thanks to Rik for helping correcting what I had misremembered and providing much of the detail.

    image

    For SQL we have two physical servers with Intel processors. Each has a pair of mirrored disks for the OS and RAID5 group of disks for data. We use SQL 2012 Enterprise Always On for replication to keep the DBs in sync. The servers are part of a Windows cluster (needed for Always On) and we use a VM to give a third server in the witness role. This is hosted on a production Hyper-V cloud. We have a number of availability groups on this platform, basically one per service we run. This allows us to split the read/write load between the two servers (unless they have failed over to a single box). If we had only one availability group for all the DBs  one node would being all the read/write and the other read only, so not that balanced.

    SCVMM runs on a physical server with a pair of hardware-mirrored 2Tb disks for 2Tb of storage. That’s split into two partitions, as you can’t use data de-duplication on the OS volume of Windows. This allows us to have something like 5Tb of Lab VM images stored on the SCVMM library share that’s hosted on the SCVMM server. This share is for lab management use only.

    We also have two physical servers that make up a Windows Cluster with a Cluster Shared Volume on an iSCSI SANs. This hosts a number of SCVMM libraries for ISO Images, Production VM Images and test stuff. Data de-duplication again is giving us an 80% space saving on the SAN (ISO images of OS’ and VHDs of installed OS’ dedupe _really_ well)

    Our Lab cloud currently has three AMD based servers. They use the same disk setup as the SQL boxes, with a mirrored pair for OS and RAID5 for VM storage.

    Our Production Hyper-V also has three servers, but this time in a Windows Cluster using a Cluster Shared Volume on our other iSCSI SAN for VM storage so it can do automated failover of VMs.

    Each of the SQL servers, SCVMM servers and Lab Hyper-V servers uses Windows Server 2012 R2 NIC teaming to combine 2 x 1Gbit NICs which gives us better throughput and failover. The lab servers have one team for VM traffic and one team for the hyper-v management that is used when deploying VMs. That means we can push VMs around as fast as the disks will push data in either direction, pretty much, without needed expensive 10Gbit Ethernet.

    So I hope that answers any questions.

    What I learnt getting Release Management running with a network Isolated environment

    Updated 20 Oct 2014 – With notes on using an Action for cross domain authentication

    In my previous post I described how to get a network isolated environment up and running with Release Management, it is all to do with shadow accounts. Well getting it running is one thing, having a useful release process is another.

    For my test environment I needed to get three things deployed and tested

    • A SQL DB deployed via a DACPAC
    • A WCF web service deployed using MSDeploy
    • A web site deployed using MSDeploy

    My environment was a four VM network isolated environment running on our TFS Lab Management system.

    image 

    The roles of the VMs were

    • A domain controller
    • A SQL 2008R2 server  (Release Management deployment agent installed)
    • A VM configured as a generic IIS web server (Release Management deployment agent installed)
    • A VM configured as an SP2010 server (needed in the future, but its presence caused me issues so I will mention it)

    Accessing domain shares

    The first issue we encountered was that we need the deployment agent on the VMs to be able to access domain shares on our corporate network, not just ones on the local network isolated domain. They need to be able to do this to download the actual deployment media. The easiest way I found to do this was to place a NET USE command at the start of the workflow for each VM I was deploying too. This allowed authentication from the test domain to our corporate domain and hence access for the agent to get the files it needed. The alternatives would have been using more shadow accounts, or cross domain trusts, both things I did not want the hassle of managing.

    image

    The run command line activity runs  the net command with the arguments use \\store\dropsshare [password] /user:[corpdomain\account]

    I needed to use this command on each VM I was running the deployment agent on, so appears twice in this workflow, once for the DB server and once for the web server.

    Updated 20 Oct 2014: After using this technique in a few release I realised it was a far better idea to have a action to do the job. The technique I mentioned above meant the password was in clear text, a parameterised action allows it to be encrypted.

    To create an action (and it can be an action, not a component as it does not need to know the build location) needs the following settings

    image

    Version of SSDT SQL tools

    My SQL instance was SQL 2008R2, when I tried to use the standard Release Management DACPAC Database Deployer tool it failed with assembly load errors. Basically the assemblies downloaded as part of the tool deployment did not match anything on the VM.

    My first step was to install the latest SQL 2012 SSDT tools on the SQL VM. This did not help the problem as there was still a mismatch between the assemblies. I therefore create a new tool in the Release Management inventory, this was a copy of the existing DACPAC tool command, but using the current version of the tool assemblies from SSDT 2012

    image

    Using this version of the tools worked, my DB could be deployed/updated.

    Granting Rights for SQL

    Using SSDT to deploy a DB (especially if you have the package set to drop the DB) does  not grant any user access rights.

    I found the easiest way to grant the rights the web service AppPool accounts needed was to run a SQL script. I did this by creating a component for my release with a small block of SQL to create DB owners, this is the same technique as used for the standard SQL create/drop activities shipped in the box with Release Management.

    The arguments I used for the sqlcmd were -S __ServerName__ -b -Q "use __DBname__ ; create user [__username__] for login [__username__];  exec sp_addrolemember 'db_owner', '__username__';"

    image

    Once I had created this component I could pass the parameters I needed add DB owners.

    Creating the web sites

    This was straight forward, I just used the standard components to create the required AppPools and the web sites. It is worth nothing that these command can be run against existing site, the don’t error if the site/AppPool already exists. This seems to be the standard model with Release Management as there is no decision (if) branching in the workflow, so all tools have to either work or stop the deployment.

    image

    I then used the irmsdeploy.exe Release Management component to run the MSDeploy publish on each web site/service

    image

    A note here: you do need make sure you set the path to the package to be the actual folder the .ZIP file is in, not the parental drop folder (in my case Lab\_PublishedWebsites\SABSTestHarness_Package not Lab)

    image

    Running some integration tests

    We now had a deployment that worked. It pulled the files from our corporate LAN and deployed them into a network isolated lab environment.

    I now wanted to run some tests to validate the deployment. I chose to use some SQL based tests that were run via MSTest. These tests had already been added to Microsoft Test Manager (MTM) using TCM, so I thought I had all I needed.

    I added the Release Management MTM component to my workflow and set the values taken from MTM for test plan and suite etc.

    image

    However I quickly hit cross domain authentication issues again. The Release Management component does all this test management via a PowerShell script that runs TCM. This must communicate with TFS, which in my system was in the other domain, so fails.

    The answer was to modify the PowerShell script to also pass some login credentials

    image

    The only change in the PowerShell script was that each time the TCM command is called the /login:$LoginCreds block is added, where $LoginCreds are the credentials passed in the form corpdomain\user,password

    $testRunId = & "$tcmExe" run /create /title:"$Title" /login:$LoginCreds /planid:$PlanId /suiteid:$SuiteId /configid:$ConfigId /collection:"$Collection" /teamproject:"$TeamProject" $testEnvironmentParameter $buildDirectoryParameter $buildDefinitionParameter $buildNumberParameter $settingsNameParameter $includeParameter
       

    An interesting side note is that if you try to run the TCM command at the command prompt you only need to provide the credentials on the first time it is run, they are cached. This does not seem to be the case inside the Release Management script, TCM is run three times, each time you need to pass the credentials.

    Once this was in place, and suitable credentials added to the workflow I expected my test to run. They did but 50% failed – Why?

    It runs out the issue was that in my Lab Management environment setup I had set the roles of both IIS server and SharePoint server to Web Server.

    My automated test plan in MTM was set to run automated tests on the Web Server role, so sent 50% of the tests to each of the available servers. The tests were run by Lab Agent (not the deployment agent) which was running as the Network Service machine accounts e.g. Proj\ProjIIS75$ and Proj\ProjSp2010$. Only for former of these had been granted access to the SQL DB (it was the account being used for the AppPool), hence half the test failed, with DB access issues

    I had two options here, grant both machine accounts access, or alter my Lab Environment. I chose the latter. I put the two boxes in different roles

    image

    I then had to load the test plan in MTM so it was updated with the changes

    image

    Once this was done my tests then ran as expected.

    Summary

    So I now have a Release Management deployment plan that works for a network isolated environment. I can run integration tests, and will soon add some CodeUI ones, it is should only be a case of editing the test plan.

    It is an interesting question of how well Release Management, in its current form, works with Lab Management when it is SCVMM/Network Isolated environment based, is is certainly not its primary use case, but it can be done as this post shows. It certainly provides more options than the TFS Lab Management build template we used to use, and does provide an easy way to extend the process to manage deployment to production.

    Getting started with Release Management with network isolated Lab Management environments

    Our testing environments are based on TFS Lab Management, historically we have managed deployment into them manually (or at least via a PowerShell script run manually) or using TFS Build. I thought it time I at least tried to move over to Release Management.

    The process to install the components of Release Management is fairly straight forward, there are wizards that ask little other than which account to run as

    • Install the deployment server, pointing at a SQL instance
    • Install the management client, pointing at the deployment server
    • Install the deployment agent on each box you wish to deploy to, again pointing it as the deployment server

    I hit a problem with the third step. Our lab environments are usually network isolated, hence each can potentially be running their own copies of the same domain. This means the connection from the deployment agent to the deployment server is cross domain. We don’t want to setup cross domain trusts as

    1. we don’t want cross domain trusts, they are a pain to manage
    2. as we have multiple copies of environments there are more than one copy of some domains – all very confusing for cross domain trusts

    So this means you have to use shadow accounts, as detailed in MSDN for Release Management, the key with this process is to make sure you manually add the accounts in Release Management (step 2) – I missed this at first as it differs from what you usually need to do.

    To resolve this issue, add the Service User accounts in Release Management. To do this, follow these steps:

      1. Create a Service User account for each deployment agent in Release Management. For example, create the following:

        Server1\LocalAccount1
        Server2\LocalAccount1
        Server3\LocalAccount1

      2. Create an account in Release Management, and then assign to that account the Service User and Release Manager user rights. For example, create Release_Management_server\LocalAccount1.
      3. Run the deployment agent configuration on each deployment computer.

    However I still had a problem, I entered the correct details in the deployment  configuration client, but got the error

    image

    The logs showed

    Received Exception : Microsoft.TeamFoundation.Release.CommonConfiguration.ConfigurationException: Failed to validate Release Management Server for Team Foundation Server 2013.
       at Microsoft.TeamFoundation.Release.CommonConfiguration.DeployerConfigurationManager.ValidateServerUrl()
       at Microsoft.TeamFoundation.Release.CommonConfiguration.DeployerConfigurationManager.ValidateAndConfigure(DeployerConfigUpdatePack updatePack, DelegateStatusUpdate statusListener)
       at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)

    A quick look using WireShark showed it was try to access http://releasemgmt.mydomain.com:1000/configurationservice.asmx,if I tried to access this in a browser it showed

     

    Getting

    Server Error in '/' Application.
    --------------------------------------------------------------------------------

    Request format is unrecognized.
    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
    Exception Details: System.Web.HttpException: Request format is unrecognized.

    Turns out the issue was I needed to be run the deployment client configuration tool as the shadow user account, not as any other local administrator.

    Once I did this the configuration worked and the management console could scan for the client. So now I can really start to play…

    Fix for intermittent connection problem in lab management – restart the test controller

    Just had a problem with a TFS 2012 Lab Management deployment build. It was working this morning, deploying two web sites via MSDeploy and a DB via a DacPac, then running some CodedUI tests. However, when I tried a new deployment this afternoon it kept failing with the error:

    The deployment task was aborted because there was a connection failure between the test controller and the test agent.

    image

    If you watched the build deployment via MTM you could see it start OK, then the agent went off line after a few seconds.

    Turns out the solution was the old favourite, to reboot of the Build Controller. Would like to know why it was giving this intermittent problem though.

    Update 14th Jan An alternative solution to rebooting is to add a hosts file entry on the VM running the test agent for the IP address of the test controller. Seems the problem is name resolution, but not sure why it occurs

    When your TFS Lab test agents can’t start check the DNS

    Lab Management has a lot of moving parts, especially if you are using SCVMM based environments. All the parts have to communicate if the system is work.

    One of the most common problem I have seen are due to DNS issues. A slowly propagating DNS can cause chaos as the test controller will not be able to resolve the name of the dynamically registered lab VMs.

    The best fix is to sort out your DNS issues, but that is not always possible (some things just take the time they take, especially on large WANs).

    An immediate fix is to use the local host files on the test controller to define IP address for the lab[guid].corp.domain names created when using network isolation. Once this is done the handshake between the controller and agent is usually possible.

    If it isn’t then you are back to all the usually diagnostics tools

    Building VMs for use in TFS Lab Management environments

    We have recently gone through an exercise to provide a consistent set of prebuilt configured VMs for use in our TFS Lab Management environments.

    This is not an insignificant piece of work as this post by Rik Hepworth discusses detailing all the IT pro work he had to do to create them. This is all before we even think about the work required to create deployment TFS builds and the like.

    It is well worth a read if you are planning to provision a library of VM for Lab Management as it has some really useful tips and tricks