But it works on my PC!

The random thoughts of Richard Fennell on technology and software development

Automating TFS Build Server deployment with SCVMM and PowerShell

Rik recently posted about the work we have done to automatically provision TFS build agent VMs. This has come out of us having about 10 build agents on our TFS server all doing different jobs, with different SDKs etc. When we needed to increase capacity for a given build type we had a problems, could another agent run the build? what exactly was on the agent anyway? An audit of the boxes made for horrible reading, there were very inconsistent.

So Rik automated the provision of new VMs and I looked at providing a PowerShell script to install the base tools we needed  on our build agents, knowing this list is going to change a good deal over time. After some thought, for our first attempt we picked

  • TFS itself (to provide the 2013.2 agent)
  • Visual Studio 2013.2 – you know you always end up installing it in the end to get SSDT, SDK and MSBuild targets etc.
  • WIX 3.8
  • Azure SDK 2.3 for Visual Studio 2013.2 – Virtually all our current projects need this. This is actually why we have had capacity issue on the old build agents as this was only installed on one.

Given this basic set of tools we can build probably 70-80% of our solutions. If we use this as the base for all build boxes we can then add extra tools if required manually, but we expect we will just end up adding to the list of items installed on all our build boxes, assuming the cost of installing the extra tools/SDKs is not too high. Also we will try to auto deploy tools as part of our build templates where possible, again reducing what needs to be placed on any given build agent.

Now the script I ended up with is a bit rough and ready but it does the job. I think in the future a move to DSC might help in this process, but I did not have time to write the custom resources now. I am assuming this script is going to be a constant work in progress as it is modified for new releases and tools.  I did make the effort to make all the steps check to see if they needed to be done, thus allowing the re-running of the script to ‘repair’ the build agent. All the writing to the event log is to make life easier for Rik when working out what is going on with the script, especially useful due to the installs from ISOs being a bit slow to run.

 

# make sure we have a working event logger with a suitable source
Create-EventLogSource -logname "Setup" -source "Create-TfsBuild"
write-eventlog -logname Setup -source Create-TfsBuild -eventID 6 -entrytype Information -message "Create-Tfsbuild started"

# add build service as local admin, not essential but make life easier for some projects
Add-LocalAdmin -domain "ourdomain" -user "Tfsbuilder"





# Install TFS, by mounting the ISO over the network and running the installer

# The command ‘& $isodrive + ":\tfs_server.exe" /quiet’ is run

# In the function use a while loop to see when the tfconfig.exe file appears and assume the installer is done – dirty but works

# allow me to use write-progress to give some indication the install is done.
Write-Output "Installing TFS server"
Add-Tfs "\\store\ISO Images\Visual Studio\2013\2013.2\en_visual_studio_team_foundation_server_2013_with_update_2_x86_x64_dvd_4092433.iso"

Write-Output "Configuring TFS Build"
# clear out any old config – I found this helped avoid error when re-running script

# A System.Diagnostics.ProcessStartInfo object is used to run the tfsconfig command with the argument "setup /uninstall:All"

# ProcessStartInfo is used so we can capture the error output and log it to the event log if required
Unconfigure-Tfs


# and reconfigure, again using tfsconfig, this time with the argument "unattend /configure  /unattendfile:config.ini", where

# the config.ini has been created with tfsconfig unattend /create flag (check MSDN for the details)
Configure-Tfs "\\store\ApplicationInstallers\TFSBuild\configsbuild.ini"

# install vs2013, again by mounting the ISO running the installer, with a loop to check for a file appearing
Write-Output "Installing Visual Studio"
Add-VisualStudio "\\store\ISO Images\Visual Studio\2013\2013.2\en_visual_studio_premium_2013_with_update_2_x86_dvd_4238022.iso"

# install wix by running the exe with the –q options via ProcessStartInfo again.
Write-Output "Installing Wix"
Add-Wix "\\store\ApplicationInstallers\wix\wix38.exe"

# install azure SDK using the Web Platform Installer, checking if the Web PI is present first and installing it if needed

# The Web PI installer lets you ask to reinstall a package, if it is it just ignores the request, so you don’t need to check if Azure is already installed
Write-Output "Installing Azure SDK"
Add-WebPIPackage "VWDOrVs2013AzurePack"

write-eventlog -logname Setup -source Create-TfsBuild -eventID 7 -entrytype Information -message "Create-Tfsbuild ended"
Write-Output "End of script"

 

So for a first pass this seems to work, I now need to make sure all our build can use this cut down build agent, if they can’t do I need to modify the build template? or do I need to add more tools to our standard install? or decide if it is going to need a special agent definition?

Once this is all done the hope is that when all the TFS build agents need patching for TFS 2013.x we will just redeploy new VMs or run a modified script to silently do the update. We shall see if this delivers on that promise

Could not load file or assembly 'Microsoft.TeamFoundation.WorkItemTracking.Common, Version=12.0.0.0’ when running a build on a new build agent on TFS 2013.2

I am currently rebuilding our TFS build infrastructure, we have too many build agents that are just too different, they don’t need to be. So I am looking at a standard set of features on a build agent and the ability to auto provision new instances to make scaling easier. More on this in a future post…

Anyway whilst testing a new agent I had a problem. A build that had worked on a previous test agent failed with the error

Could not load file or assembly 'Microsoft.TeamFoundation.WorkItemTracking.Common, Version=12.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

The log showed it was failing to even do a get latest of the files to build, or anything on the build agent.

image

 

Turns out the issue was the PowerShell script that installed all the TFS build components and SDKs had failed when trying to install the Azure SDK for VS2013, the Web Deploy Platform was not installed so when it tried to use the command line installer to add this package it failed.

I fixed the issues for the Web PI tools and re-ran the command line to installed the Azure SDK and all was OK.

Not sure why this happened, maybe a missing pre-req put on by Web PI itself was the issue. I know older versions did have a .NET 3.5 dependency. Once to keep an eye on

Building Azure Cloud Applications on TFS

If you are doing any work with Azure Cloud Applications there is a very good chance you will want your automated build process to produce the .CSPKG deployment file, you might even want it to do the deployment too.

On our TFS build system, it turns out this is not a straight forward as you might hope. The problem is that the MSbuild publish target that creates the files creates them in the $(build agent working folder)\source\myproject\bin\debug folder. Unlike the output of the build target which puts them in the $(build agent working folder)\binaries\ folder which gets copied to the build drops location. Hence though the files are created they are not accessible with the rest of the built items to the team.

I have battled to sort this for a while, trying to avoid the need to edit our customised TFS build process template. This is something we try to avoid where possible, favouring environment variables and MSbuild arguments where we can get away with it. There is no point denying that editing build process templates is a pain point on TFS.

The solution – editing the process template

Turns out a colleague had fixed the same problem a few projects ago and the functionality was already hidden in our standard TFS build process template. The problem was it was not documented; a lesson for all of us, that it is a very good idea to put customisation information in a searchable location so others find customisations that are not immediate obvious. Frankly this is one of the main purposes of this blog, somewhere I can find what I did that years, as I won’t remember the details.

Anyway the key is to make sure the publish target for the MSBbuild uses the correct location to create the files. This is done using a pair of MSBuild arguments in the advanced section of the build configuration

  • /t:MyCloudApp:Publish -  this tells MSbuild to perform the publish action for just the project MyCloudApp. You might be able to just go /t:Publish if only one project in your solution has a Publish target
  • /p:PublishDir=$(OutDir) - this is the magic. We pass in the temporary variable $(OutDir). At this point we don’t know the target binary location as it is build agent/instance specific, customisation in the TFS build process template converts this temporary value to the correct path.

In the build process template in the Initialize Variable sequence within Run on Agent add a If Activity.

image

  • Set the condition to MSBuildArguments.Contains(“$(OutDir)”)
  • Within the true branch add an Assignment activity for the MSBuildArguments variable to MSBuildArguments.Replace(“$(OutDir)”, String.Format(“{0}\{1}\\”, BinariesDirectory, “Packages”))

This will swap the $(OutDir) for the correct TFS binaries location within that build.

After that it all just works as expected. The CSPKG file etc. ends up in the drops location.

Other things that did not work (prior to TFS 2013)

I had also looked a running a PowerShell script at the end of the build process or adding an AfterPublish target within the MSBuild process (by added it to the project file manually) that did a file copy. Both these methods suffered the problem that when the MSBuild command ran it did not know the location to drop the files into. Hence the need for the customisation above.

Now I should point out that though we are running TFS 2013 this project was targeting the TFS 2012 build tools, so I had to use the solution outlined above, a process template edit. However, if we had been using the TFS 2013 process template as our base for customisation then we would have had another way to get around the problem.

TFS 2013 exposes the current build settings as environment variables. This would allow us to use a AfterPublish MSBuild Target something like

<Target Name="CustomPostPublishActions" AfterTargets="AfterPublish" Condition="'$(TF_BUILD_DROPLOCATION)' != ''">
  <Exec Command="echo Post-PUBLISH event: Copying published files to: $(TF_BUILD_DROPLOCATION)" />
  <Exec Command="xcopy &quot;$(ProjectDir)bin\$(ConfigurationName)\app.publish&quot; &quot;$(TF_BUILD_DROPLOCATION)\app.publish&quot; /y " />
</Target>

So maybe a simpler option for the future?

The moral of the story document your customisations and let your whole team know they exist

How long is my TFS 2010 to 2013 upgrade going to take – Part 2

Back in January I did a post How long is my TFS 2010 to 2013 upgrade going to take? I have now done some more work with one of the clients and have more data. Specially the initial trial was 2010 > 2013 RTM on a single tier test VM; we have now done a test upgrade from 2010 > 2013.2 on the same VM and also one to a production quality dual tier system.

image

The key lessons are

  • There a 150 more steps to go from 2013 RTM to 2013.2, it takes a good deal longer.
  • The dual tier production hardware is nearly twice as fast to do the upgrade, though the initial steps (step 31, moving the source code) is not that much faster. It is the steps after this that are faster. We put it down to far better SQL throughput.

Cloning tfs repository with git-tf gives a "a server path must be absolute"

I am currently involved in moving some TFS TFVC hosted source to a TFS Git repository.  The first step was to clone the source for a team project from TFS using the command

git tf clone --deep http://tfsserver01:8080/tfs/defaultcollection ‘$My Project’ localrepo1

and it worked fine. However the next project I tried to move had no space in the source path

git tf clone --deep http://tfsserver01:8080/tfs/defaultcollection ‘$MyProject’ localrepo2

This gave the error

git-tf: A server path must be absolute.

Turns out if the problem was the single quotes. Remove these and the command worked as expected

git tf clone --deep http://tfsserver01:8080/tfs/defaultcollection $MyProject localrepo2

Seems you should only use the quotes when there are spaces in a path name.

Updated 11 June – After a bit more thought I think I have tracked down the true cause. It is not actually the single quote, but the fact the command line had been cut and pasted from Word. This mean the quote was a ‘ not a '. Cutting and pasting from Word can always lead to similar problems, but it is still a strange error message, I would have expected an invalid character message

Our TFS Lab Management Infrastructure

After my session Techorama last week I have been asked some questions over how we built our TFS Lab Management infrastructure. Well here is a bit more detail, thanks to Rik for helping correcting what I had misremembered and providing much of the detail.

image

For SQL we have two physical servers with Intel processors. Each has a pair of mirrored disks for the OS and RAID5 group of disks for data. We use SQL 2012 Enterprise Always On for replication to keep the DBs in sync. The servers are part of a Windows cluster (needed for Always On) and we use a VM to give a third server in the witness role. This is hosted on a production Hyper-V cloud. We have a number of availability groups on this platform, basically one per service we run. This allows us to split the read/write load between the two servers (unless they have failed over to a single box). If we had only one availability group for all the DBs  one node would being all the read/write and the other read only, so not that balanced.

SCVMM runs on a physical server with a pair of hardware-mirrored 2Tb disks for 2Tb of storage. That’s split into two partitions, as you can’t use data de-duplication on the OS volume of Windows. This allows us to have something like 5Tb of Lab VM images stored on the SCVMM library share that’s hosted on the SCVMM server. This share is for lab management use only.

We also have two physical servers that make up a Windows Cluster with a Cluster Shared Volume on an iSCSI SANs. This hosts a number of SCVMM libraries for ISO Images, Production VM Images and test stuff. Data de-duplication again is giving us an 80% space saving on the SAN (ISO images of OS’ and VHDs of installed OS’ dedupe _really_ well)

Our Lab cloud currently has three AMD based servers. They use the same disk setup as the SQL boxes, with a mirrored pair for OS and RAID5 for VM storage.

Our Production Hyper-V also has three servers, but this time in a Windows Cluster using a Cluster Shared Volume on our other iSCSI SAN for VM storage so it can do automated failover of VMs.

Each of the SQL servers, SCVMM servers and Lab Hyper-V servers uses Windows Server 2012 R2 NIC teaming to combine 2 x 1Gbit NICs which gives us better throughput and failover. The lab servers have one team for VM traffic and one team for the hyper-v management that is used when deploying VMs. That means we can push VMs around as fast as the disks will push data in either direction, pretty much, without needed expensive 10Gbit Ethernet.

So I hope that answers any questions.

Getting ‘The build directory of the test run either does not exist or access permission is required’ error when trying to run tests as part of the Release Management deployment

Whilst running tests as part of a Release Management deployment I started seeing the error ‘The build directory of the test run either does not exist or access permission is required’, and hence all my tests failed. It seems that there are issues that can cause this problem, as mentioned in the comments in Martin Hinshelwood’s post on running tests in deployment, specially spaces in the build name can cause this problem, but this was not the case for me.

Strangest point was it used to work, what had I changed?

To debug the problem I logged into the test VM as the account the deployment service was running as (a shadow account as the environment was network isolated). I got the command line that the component was trying to run by looking at the messages in the deployment log

image

I then went to the deployment folder on the test VM

%appdata%\local\temp\releasemanagement\[the release management component name]\[release number]

and ran the same command line. Strange thing was this worked! All the tests ran and passed OK, TFS was updated, everything was good.

It seemed I only had an issue when triggering the tests via a Release Management deployment, very strange!

A side note here, when I say the script ran OK it did report an error and did not export and unpack the test results from the TRX file to pass back to the console/release management log. Turns out this is because the MTMExec.ps1 script uses the command [System.IO.File]::Exist(..) to check if the .TRX file has been produced. This fails when the script is run manually. This is because it relies on [Environment]::CurrentDirectory, which is not set the same way when run manually as when a script is called by the deployment service. When run manually it seems to default to c:\windows\system32 not the current folder.

If you are editing this script, and want it to work in both scenarios, then probably best to use the PowerShell Test-Path(..) cmdlet as opposed to [System.IO.File]::Exist(..) 

So where to look for this problem, the error says something can’t access the drops location, but what?

A bit of thought as to who is doing what can help here

image

When the deployment calls for a test to be run

  • The Release Management deployment agent pulls the component down to the test VM from the Release Management Server
  • It then runs the Powershell Script
  • The PowerShell script runs TCM.exe to trigger the test run, passing in the credentials to access the TFS server and Test Controller
  • The Test Controller triggers the tests to be run on the Test Agent, providing it with the required DLLs from the TFS drops location – THIS IS THE STEP WITH THE PROBLEM IS SEEN
  • The Test Agent runs the tests and passes the results back to TFS via the Test Controller
  • After the PowerShell script triggers the test run it loops until the test run is complete.
  • It then uses TCM again to extract the test results, which it parses and passes back to the Release Management server

So a good few places to check the logs.

Turns out the error was being reported on the Test Controller.

image

(QTController.exe, PID 1208, Thread 14) Could not use lab service account to access the build directory. Failure: Network path does not exist or is not accessible using following user: \\store\drops\Sabs.Main.CI\Sabs.Main.CI_2.3.58.11938\ using blackmarble\tfslab. Error Code: 53

The error told me the folder and who couldn’t access it, the domain service account ‘tfslab’ the Test Agents use to talk back to the Test Controller.

I checked the drops location share and this user has adequate access rights. I even logged on to the Test Controller as this user and confirmed I could open the share.

I then had a thought, this was the account the Test Agents were using to communicate with the Test Controller, but was it the account the controller was running as? A check showed it was not, the controller was running as the default ‘Local System’. As soon as I swapped to using the lab service account (or I think any domain account with suitable rights) it all started to work.

image

So why did this problem occur?

All I can think of was that (to address another issue with Windows 8.1 Coded-UI testing) the Test Controller was upgraded to 2013.2RC, but the Test Agent in this lab environment was still at 2013RTM. Maybe the mismatch is the issue?

I may revisit and retest with the ‘Local System’ account when 2013.2 RTM’s and I upgrade all the controllers and agents, but I doubt it. I have no issue running the test controller as a domain account.

Setting the LocalSQLServer connection string in web deploy

If you are using Webdeploy you might wish to alter the connection string the for the LocalSQLServer that is used by the ASP.NET provider for web part personalisation. The default is to use ASPNETDB.mdf in the APP_Data folder, but in a production system you could well want to use a ‘real’ SQL server.

If you look in your web config, assuming you are not using the default ‘not set’ setting, will look something like

<connectionStrings>
    <clear />
    <add name="LocalSQLServer" connectionString="Data Source=(LocalDB)\projects; Integrated Security=true ;AttachDbFileName=|DataDirectory|ASPNETDB.mdf" providerName="System.Data.SqlClient" />
  </connectionStrings>

Usually you expect any connection strings in the web.config to appear in the Web Deploy publish wizard, but it does not. I have no real idea why, but maybe it is something to do with having to use <clear /> to remove the default?

image

If you use a parameters.xml file to add parameters to the web deploy you would think you could add the block

<parameter name="LocalSQLServer" description="Please enter the ASP.NET DB path" defaultvalue="__LocalSQLServer__" tags="">
  <parameterentry kind="XmlFile" scope="\\web.config$" match="/configuration/connectionStrings/add[@name='LocalSQLServer']/@connectionString" />
</parameter>

However, this does not work, in the setparameters.xml that is generated you find  two entries, first yours then the auto-generated one, and the last one wins, so you don’t get the correct connection string.

<setParameter name="LocalSQLServer" value="__LocalSQLServer__" />
<setParameter name="LocalSQLServer-Web.config Connection String" value="Data Source=(LocalDB)\projects; Integrated Security=true ;AttachDbFileName=|DataDirectory|ASPNETDB.mdf" />

The solution I found manually add your parameter in the parameters.xml file as

<parameter name="LocalSQLServer-Web.config Connection String" description="LocalSQLServer Connection String used in web.config by the application to access the database." defaultValue="__LocalSQLServer__" tags="SqlConnectionString">
  <parameterEntry kind="XmlFile" scope="\\web.config$" match="/configuration/connectionStrings/add[@name='LocalSQLServer']/@connectionString" />
</parameter>

With this form the connection string was correctly modified as only one entry appears in the generated file

Changing WCF bindings for MSDeploy packages when using Release Management

Colin Dembovsky’s excellent post ‘WebDeploy and Release Management – The Proper Way’ explains how to pass parameters from Release Management into MSDeploy to update Web.config files. On the system I am working on I also need to do some further web.config translation, basically the WCF section is different on a Lab or Production build as it needs to use Kerberos, whereas local debug builds don’t.

In the past I dealt with this, and editing the AppSettings, using MSDeploy web.config translation. This worked fine, but it meant I built the product three time, exactly what Colin’s post is trying to avoid. The techniques in the post for the AppSettings and connection strings are fine, but don’t apply so well for the large block swapouts, as I need for WCF bindings section.

I was considering my options when I realised there a simple option.

  • My default web.config has the bindings for local operation i.e. no Kerberos
  • The web.debug.config translation hence does nothing
  • Both web.lab.config and web.release.confing translations have Kerberos bindings swapped out

So all I needed to do was build the Release build (as you would for production release anyway) this will have the correct bindings in the MSDeploy package for both Lab and Release. You can then use Release Management to set the AppSettings and connection strings as required.

Simple, no extra handling required. I had thought my self into a problem I did not really have.

Release Management components fail to deploy with a timeout if a variable is changed from standard to encrypted

I have been using Release Management to update some of our internal deployment processes. This has included changing the way we roll out MSDeploy packages; I am following Colin Dembovsky’s excellent post of the subject.

I hit an interesting issue today. One of the configuration variable parameters I was passing into a component was a password field. For my initial tests had just let this be a clear text ‘standard’ string in the Release Management. Once I got this all working I thought I better switch this variable to ‘encrypted’, so I just change the type on the Configuration Variables tab.

image 

On doing this I was warned that previous deployment would not be re-deployable, but that was OK for me, it was just a trial system. I would not be going back to older versions.

However when I tried to run this revised release template all the steps up to the edited MSDeploy step were fine, but the MSDeploy step never ran it just timed out. The component was never deployed to the target machine %appdata%\local\temp\releasemanagement folder.

image

In the end, after a few reboots to confirm the comms were OK, I just re-added the component to the release template and entered all the variables again. It then deployed without a problem.

I think this is a case of a misleading error message.