Creating Hyper-V hosted Azure DevOps Private Agents based on the same VM images as used by Microsoft for their Hosted Agents

Introduction

There are times when you need to run Private Azure DevOps agents as opposed to using one of the hosted ones provided by Microsoft. This could be for a variety of reasons, including needing to access resources inside your corporate network or needing to have a special hardware specification or set of software installed on the agent.

image

If using such private agents, you really need to have an easy way to provision them. This is so that all your agents are standardised and easily re-creatable. Firstly you don’t want build agents with software on them you can’t remember installing or patching. This is just another form of the “works on one developer’s machine but not another” problem. Also if you have the means to replace the agents very regularly and reliably you can avoid the need to patch them; you can just replace them with newer VMs created off latest patched base Operating System images and software releases.

Microsoft uses Packer to build the VM images into Azure Storage. Luckily, Microsoft have open sourced their build tooling process and configuration, you can find the resources on GitHub

A fellow MVP, Wouter de Kort, has done an excellent series of posts on how to use these Packer tools to build your own Azure hosted Private Agents.

I don’t propose to go over that again. In this post, I will discuss what needs to be done to use these tools to create private agents on your own Hyper-V hardware.

By this point you are probably thinking ‘could this be done with containers? They are designed to allow the easy provisioning of things like agents’.

Well, the answer is yes that is an option. Microsoft provides both container and VM based agents and have only recently split the repo to separate the container creation logic from the VM creation logic. The container logic remains in the old GitHub home. However, in this post I am focusing on VMs, so will be working against the new home for the VM logic.

Preparation – Getting Ready to run Packer

Copy the Microsoft Repo

Microsoft’s needs are not ours, we wanted to make some small changes to the way that Packer builds VMs. The key changes are:

  • We want to add some scripts to the repo to help automate our process.
  • We don’t, at this time, make much use of Docker, so don’t bother to pre-cache the Docker images in the agent. This speeds up the image generation and keeps the VMs VHD smaller.

The way we manage these changes is to import the Microsoft repo into our Azure DevOps Services instance. We can keep our copy up to date by setting an upstream remote reference and from time to time merging in Microsoft’s changes, but more on that later.

All our are changes are done on our own long living-branch, we PR any revisions we make into this long lived branch.

The aim is to not alter the main Microsoft Packer JSON definition as sorting out a three way merge if both theirs and our versions of the main JSON file are updated is harder than I like. Rather if we don’t want a feature installed we add ’return $true’ at the start of the PowerShell script that installs the feature, thus allowing Packer to call the script, but skip the actions in the script without the need to edit the controlling JSON file.

This way of working allows us to update the master branch from the upstream repo to get the Microsoft changes, and then to regularly rebase our changes onto the updated master.

image

A local test of Packer

It is a good idea to test out the Packer build from a development PC to make sure you have all the Azure settings correct. This is done using a command along the lines of

packer.exe" build -var-file="azurepackersettings.json"  -on-error=ask "Windows2016-Azure.json"

Where the ‘windows2016-azure.json’ is the Packer definition and the ‘azurepackersettings.json’ the user configurations containing the following values. See the Packer documentation for more details

{
"client_id": "Azure Client ID",
"client_secret": "Client Secret",
"tenant_id": "Azure Tenant ID",
"subscription_id": "Azure Sub ID",
"object_id": "The object ID for the AAD SP",
"location": "Azure location to use",
"resource_group": "Name of resource group that contains Storage Account",
"storage_account": "Name of the storage account",
"ssh_password": A password",
"install_password": "A password",
"commit_url": "A url to to be save in a text file on the VHD, usually the URL if commit VHD based on"
}

If all goes well you should end up with a SysPrep’d VHD in your storage account after a few hours.

Note: You might wonder why we don’t try to build the VM locally straight onto our Hyper-V infrastructure. Packer does have a Hyper-V ISO builder but I could not get it working. Firstly finding an up to date patched Operative System ISO is not that easy and I wanted to avoid having to run Windows Update as this really slows the creation process . Also the process kept stalling as it could not seem to get a WinRM session, when I looked this seemed to be something to do with Hyper-V Vnet switches. In the end, I decided it was easier just to build to Azure storage. This also had the advantage of requiring fewer changes to the Microsoft Packer definitions, so making keeping our branch up to date easier.

Pipeline Process – Preparation Stages

The key aim was to automate the updating of the build images. So we aimed to do all the work required inside an Azure DevOps multistage pipeline. How you might choose to implement such a pipeline will depend on your needs, but I suspect it will follow a similar flow to ours.

  1. Generate a Packer VHD
  2. Copy the VHD locally
  3. Create a new agent VM from the VHD
  4. Repeat step 3. a few times

There is a ‘what comes first the chicken or the egg’ question here. How do we create the agent to run the agent creation on?

In our case, we have a special manually created agent that is run on the Hyper-V host that the new agents will be created on. This has some special configuration which I will discuss further below.

Stage 1 – Update our repo

As the pipeline has a source of our copy of the repo (and targets our branch), the pipeline will automatically get the latest version of our Packer configuration source in our repo. However, there is a very good chance Microsoft will have updated their upstream repo. We could of course manually update our repo as mentioned above and we do do this from time to time. However, just to make sure we are up to date, the pipeline also does a fetch, merge and rebases our branch on its local copy. To do this it does the following

  1. Adds the Microsoft repo as an upstream remote
  2. Fetches the latest upstream/master changes and merges them onto origin/master
  3. Rebases our working branch onto the updated origin/master

Assuming this all works, and we have not messed up a commit so causing a 3-way merge that blocks the scripts, we should have all Microsoft’s latest settings e.g packages, base images etc. plus our customisation.

Stage 2 – Run Packer

Next, we need to run Packer to generate the VHD. Luckily there is a Packer extension in the Marketplace. This provides two tasks we use

  1. Installs the Packer executable
  2. Run Packer passing in all the values, stored securely as Azure DevOps pipeline variables, as used in the azurepackersettings.json file for a local test plus the details of an Azure subscription.

Being run within a pipeline has no effect on performance, so this stage is still slow, taking hours. However, once it is completed we don’t need to run it again so we have this stage set for conditional execution based on a pipeline variable so we can skip the step if it has already completed. Very useful for testing.

Stage 3 – Copy the VHD to a Local File Share

As we are building local private agents we need the VHD file stored locally i.e. copied down to a local UNC share. This is done with some PowerShell that runs the Azure CLI. It finds the newest VHD in the Azure Storage account and copies it locally, we do assume we are the only thing creating VHDs in the storage account and that the previous stage has just completed.

Again this is slow, it can take many hours depending on how fast your internet connection is. Once the VHD file is downloaded, we create a metadata file contains the name of profile it can be used with e.g. for a VS2017 or VS2019 agent and a calculated VHD file checksum, more details on both of these below.

Now again, as this stage is slow, and once it is completed we don’t need to run it again, we have conditional execution based on a second build variable so we can skip the step if it is not needed.

If all runs Ok, then at this point we have a local copy of a SysPre’d VHD. This can be considered the preparation phase over. These stages need to be completed only once for any given generation of an agent.

Pipeline Process – Deployment Stages

At this point we now have a SysPre’d VHD, but we don’t want to have to generate each agent by hand completing the post SysPrep mini setup and installing the Azure DevOps Agent.

To automate this configuration process we use Lability. This is a PowerShell tool that wrappers PowerShell’s Desired State Configuration (DSC). Our usage of Lability and the wrapper scripts we use are

discussed in this post by a colleague and fellow MVP Rik Hepworth. However, the short summary is that Lability allows you to create an ‘environment’ which can include one or more VMs. In our case, we have a single VM in our environment so the terms are interchangeable in this post.

Each VM in an environment is based on one or more master disk images. Each instance of a VM uses its own Hyper-V diff disk off their master disk, thus greatly reducing the disk space required. This is very useful when adding multiple virtually identical agent VMs to a single Hyper-V host.

A Lability environment allows us to have a definition of what a build VM is i.e. what is its base VHD image, how much memory does it have, are there any extra disks, how many of CPU cores does it have, this list goes on. Also, it allows us to install software, in our case the Azure DevOps agent.

All the Lability definitions are stored in a separate Git repo. We have to make sure the current Lability definitions are already installed along with the Lability tools on the Azure DevOps agent that will be running these stages of the deployment pipeline. We do this by hand  on our one ‘special agent’ but it could be automated.

Remember, in our case, this ‘special agent’ is actually domain-joined, unlike all the agents we are about to create, and running on the Hyper-V host where we will be deploying the new VMs. As it is domain joined it can get to the previously downloaded Sysprep’d VHD and metadata file on a network UNC share. We are not too worried over the ‘effort’ keeping the Lability definitions update as they very rarely change, all changes tend to be in the Packer generated base VHD.

It should be remembered that this part of the deployment is a repeatable process, but we don’t just want to keep registering more and more agents. Before we add a new generation agent we want to remove an old generation one. Hence, cycling old agents out of the system, keeping things tidy.

We have experimented with naming of Lability environments to make it easier to keep things tidy. Currently, we provide two parameters into our Lability configuration

  • Prefix – A short-code to identify the role of the agent we use e.g. ‘B’ for generic build agents and ‘BT’ for ones with the base features plus BizTalk
  • Index – This number is used for two jobs, the first is to identify the environment in a set of environments of the same Prefix. It is also used to work out which Hyper-V VNet the new environment should be attached to on the Hyper-V host. Lability automatically deals with the creation of these VNets if not present.

So for on our system, for example, a VM will end up with a name in the form B1BMAgent2019, this means

  • B – It is a generic agent
  • 1 – It is on the subnet 192.168.254.0, and is the first of the B group of agents
  • BMAgent2019 – It is based on our VS2019 VHD image

Note: Also when an Azure DevOps Agent is registered with Azure DevOps, we also append a random number, based on the time, to the end of the agent name in Azure DevOps. This allows two VMs with the same prefix and index, but on different Hyper-V hosts, to be registered at the same time, or to have multiple agents on the same VM. In reality, we have not used this feature. We have ended up using unique prefix and index across agent our estate with a single agent per VM. 

Stage 4 – Disable the old agent then remove it

The first deployment step is done with a PowerShell script. We check to see if there is an agent registered with the current Prefix and Index. If there is we disable it via the Azure DevOps Rest API. This will not stop the current build but will stop the agent picking up a new one when the current one completes.

Once the agent is disabled we keep polling it, via the API, until we see it go idle. Once the agent is idle we can use the Azure DevOps API to delete the agent’s registration on Azure DevOps.

Stage 5 – Remove the old Environment

Once the agent is no longer registered with Azure DevOps we can then remove the environment running the agent. This is a Lability command that we wrapper in PowerShell scripts

This completely removes the Hyper-V VM and its diff disks that store its data, a very tidy process.

Stage 6 – Update Lability Settings

I said previously that we rarely need to update the Lability definitions. There is one exception, that is the reference to the base VHD. We need to update this to point to the copy of the Packer generated SysPrep’d VHD on the local UNC file share.

We use another PowerShell script to handle this. It scans the UNC share for metadata files to find the one containing the request media type e.g. VS2017 or VS2019 (we only keep one of each type there). It then registers this VHD in Lability using the VHD file path and the previously calculated checksum. Lability uses the checksum to work out if the file has been updated.

Stage 7 – Deploy New Environment

So all we have to do at this point is request Lability to create a new environment based on the variable parameters we pass in i.e. environment definition, prefix and any override parameters (the Azure DevOps Agent configuration) into a wrapper script.

When this script is run, it triggers Lability to create a new VM using an environment configuration.

Lability’s first step is to create the VNet if not already present.

It then checks, using the checksum, if the base Sysprep’d VHD has been copied to the Hyper-V host. If it has not been copied it is done before continuing. This can take a while but is only done once.

Next, the environment (our agent VM) is created, firstly the VM settings are set e.g. CPU & Memory and then the Windows mini setup is handled by DSC. This sets the following

  • Administrator user Account and Password
  • Networking, here we have to rename an Ethernet adapter. We have seen the name of the first Ethernet change across different versions of the Packer image, so to make our lives easier we rename the primary adaptor connected to the VNet to a known value.
  • Swap Disk, set this allow the Operating System to manage this as the default on the Packer image is to use a dedicated drive D: which we don’t have.
  • Create a dedicated drive E: for the agent.
  • Download, install and configure the Azure DevOps agent

DSC handles any reboots required.

After a few minutes, you should see a new registered agent in the requested Azure DevOps Agent Pool.

Stage 8 – Add Capabilities

Our builds make use of Azure DevOps user capabilities to target them to the correct type of agent. We use an yet another PowerShell script that waits until the new agent been registered and then it adds our custom capabilities from a comma-separated string parameter.

A little tip here. A side effect of our Lability configuration is that all the agents have the same machine name. This can make finding the host they are on awkward, especially if you have a few Hyper-V hosts. So to address this problem we add a capability of the Hyper-V hosts name, this is purely to make finding the VM easier if we have to.

Stage  9 – Copy Capabilities

We have seen that some of the Azure DevOps tasks we use have demands that are not met by the System Capabilities. The classic is a task requiring a value for the capability ‘java’ or ‘jdk’ when  the one that is present on the agent is ‘JAVA_HOME’.

To address this, as opposed to adding our own capability which might not point to the correct location, is to copy an existing capability that has the correct value. Again this is done with a PowerShell script that takes as string parameter

So what do we end up with?

When all this completes we have a running private agent that has all the features of the Microsoft hosted ones. As Microsoft adds new functionality or patch their agent images, as long as we regenerate our Packer images, we get the same features.

At this point in time, we have chosen to add any extra software we require after the end of this process, as opposed to within it. In our case, this is basically either BizTalk 2013 and BizTalk 2016 on a single agent in our pool. Again we do this with a series of scripts, but manually run this time. We would like to fully automate the process, but BizTalk does not lend itself to easy installation automation. So, after a good bit of experimentation, we decided the best option for now, was to keep our basic build process as close to the Microsoft Packer images as possible to minimise merge issues, and worry about BizTalk later. As we only have one BizTalk 2013 and one 2016 agent the cost of manually finishing off was not too high.

Where do we go from here?

We now have a process that is automated end to end. However, it can be ‘a little brittle’, but as all the stages tidy up after themselves rerunning jobs is not an issue other than in the time cost.

We still have not decided on a final workflow for the replacement of agent. At this time we use manual approvals before deploying an agent. I am sure this will change as we allow this process to mature.

It is a good starting point.