BM-Bloggers

The blogs of Black Marble staff

Define Once, Deploy Everywhere (Sort of...)

Using Lability, DSC and ARM to define and deploy multi-VM environments

Configuration as code crops up a lot in conversation these days. We are searching for that DevOps Nirvana of a single definition of our environment that we can deploy anywhere.

The solution adopted at Black Marble by myself and my colleagues is not quite that, but it comes close enough to satisfy our needs. This document details the technologies and techniques we adopted to achieve our goal, which sounds simple, right?

I want to be able to deploy a collection of virtual machines to my own computer using Hyper-V, to Dev/Test Labs in Azure, and to Azure Stack, using the same description of those virtual machines and their configuration.

Defining Our Platforms

Right now, we use Lab Manager (part of Team Foundation Server) at Black Marble to manage multi-VM environments for testing, hosted on a number of servers managed by System Center Virtual Machine Manager. Those labs are composed of virtual machines that can also be deployed to a developer’s workstation.

The issue is that those environments are pre-built – the machines are configured and the environment saved as a whole. They must be patched when a new lab is created from the stored ‘template’ VMs and adding a new machine to the lab is a pain.

Lab Manager itself is now a end-of-life, so we are looking at alternatives (including Azure Stack – see below).

Microsoft Azure

We already use Azure to host virtual machines. However, even with the lower cost Dev/Test subscription type, running lots of machines in the public cloud can get very expensive.

Azure Dev/Test Labs helps to mitigate this cost issue somewhat by providing a governance wrapper. I can create a Lab and apply rules, such as what types of virtual machine can be created, and automatically shut down running VMs at a set time to limit costs.

Within Azure we use Azure Resource Templates, which are JSON declarations of the services we require, to deploy our virtual machines. Once running, we have extensions that can be injected into a VM and used to execute scripts to configure them. With Windows servers, that means using the Desired State Configuration (DSC) extension.

Dev/Test labs allows me to connect to a Git repository of artefacts. Those artefacts could be items I wish to install into a VM, but they can also be ARM templates to deploy complex environments of multiple VMs. Those ARM templates can then apply DSC configuration definitions to the VMs themselves.

Microsoft Azure Stack

Stack is coming soon. Right now, you can download a Technical Preview that runs on a single machine. Stack is aimed at organisations that have stuff they cannot put in the public cloud, for whatever reason, but want a consistent approach to their development that can span private and public cloud. The final form of Stack is expected to be similar to the current Cloud Platform Solution (CPS), which is way out of my budget. However, the POC runs on a server very close in specification and price point to my existing Lab Manager-controlled servers.

Stack aims to deliver parity with its public cloud older brother. That means that I can use the same ARM templates I use in Azure to deploy my IaaS services on Stack. I have the same DSC extension to inject my configuration, too.

What I don’t have right now on Stack (and it’s unclear what the final product will bring, so I won’t speculate) are the base operating system images that are provided by Microsoft in Azure. I can, however, create my own images and upload them to the internal Stack equivalent of the Azure Marketplace.

Hyper-V

On our desktops, laptops, and servers we use Hyper-V, Microsoft’s virtualisation technology. This offers some parity with Azure – it uses the same VHD disk file format, for example. I don’t get the same complex software-defined-networking but I still get virtual switches to which I can connect machines, and they can be private, internal, or external.

Private switches do what they say on the tin: They are a bubble within which my VMs can communicate with each other but not with the outside world. I can, therefore, have multiple identical bubbles all using the same IP address ranges without issue.

External switches are connected directly to a network adapter on the host. That’s really useful if I need to host servers that deliver services to my organisation, as I need to communicate with them directly. This is great on servers, and is useful on developer workstations with physical NICs. On laptops, however, it gets tricky if you’re using a WiFi network. Those were never designed with VMs in mind, and the way Windows connects an external switch to a wireless adapter is, quite frankly, a horrible kludge and I’ve always found it terribly unreliable.

Internal switches create a new virtual NIC on the host so it can communicate directly with VMs on the network. In Windows 10, we can use an internal switch alongside a NetNat, which allows Windows 10 to provide network address translation for the virtual network. This gives us a setup like your home internet – VMs can communicate out but there are no direct inbound connection allowed (yes, I know you can create NAT publishing rules too, but that’s not a topic for here).

One cool thing about a NetNat is that if you carefully define your IP address ranges, a single NetNat can pass traffic into the networks generated by multiple virtual switches. This allows me to have multiple environments that can coexist on separate subnets.

Lability

I’ve saved this until last because it’s sort of the secret sauce in what we’ve been working on. I stumbled on Lability totally by chance, and random internet searching. It’s an open source solution to defining and deploying VMs on Windows using DSC to declare both the configuration of the environment (the VMs and their settings) and the VMs themselves (the guest OS configuration).

Lability was created by a chap called Iain Brighton and he deserves a great deal of credit for what he’s built.

With Lability, I can use the same DSC configurations that I created for my Azure deployments. I can use the same base VHD images that I need for my Azure Stack Deployments. Lability uses a DSC PowerShell file (.ps1), which can include configurations for multiple nodes – each of the VMs in our environment. It then uses a PowerShell Data file (.psd1) to declare the configuration of the VMs themselves (CPU, RAM, virtual switch etc) as well as pass in configuration details to the DSC file.

If you look at the Lability repo on GitHub you will find links to some excellent articles by people who have used Lability and take you through setting up your Lability Host (your computer) and your first environment.

Identifying Differences

Applying DSC

Lability and the Azure DSC extension work in a subtly but importantly different manner. When you create a DSC configuration, you write a PowerShell configuration which imports DSC Resources that will do the actual configuration work and you call those resources with specified values that declare the state of the configuration you want. Within that PowerShell file you can put functions that figure out some of those values.

When you execute the PowerShell configuration, it runs through that script and generates a MOF file. That file is submitted to the DSC engine on the machine that you are configuring and used to pass parameters into the DSC Resources that are going to execute commands to apply your configuration.

When you use the DSC extension in Azure, it installs the necessary DSC resources on the VM and executes the PowerShell file on that machine, generating the MOF which is then applied.

When you use Lability, the PowerShell file is executed on the host machine and outputs the MOF files – you do this manually before executing a Lability command to create a new lab. Lability then takes care of injecting the MOF and the required DSC resources into the virtual machine, where the configuration is applied.

This is a critical difference! If you look at the examples in the Azure Quickstart Repo, all the DSC is written assuming that it is executed on the host, and uses PowerShell functions to do things like finding the network adapter, or the host IP address etc. If you look at the examples used in Lability labs, the data file provides many of those pieces of information. If you run the PowerShell from an Azure QuickStart template you’ll have some crazy failures, because all those functions execute on the host and therefore get totally incorrect information to pass to the configuration code.

Additionally, none of the Azure examples use a data file to provide configuration data. You might think this is because the data file is not supported. However, this is not true – you can pass a data file in using the DSC extension. Lability makes heavy use of that data file to define our environment.

Networking

In Azure, you cannot set a static IP address from within the VM itself. The networking fabric hands the machine its IP address via DHCP. You can set that IP to be static through the Azure fabric, but not through the VM. That might mean that we don’t know the IP address of a machine before we deploy it.

With Lability, we declare the IP address of the VM in the DSC data file. We could use a DHCP server running on the host, and I do just that myself, but it’s more stuff to install and manage, and for our approach to labs right now we’ve stuck to declaring address in the DSC data file.

We also have additional stuff to think about in Azure – public IP addresses, Network Security Groups and possibly User Defined Routing that controls how (and if) we allow inbound traffic from the internet onto our network, what can talk to what and on which ports within our network, and whether we want to push all traffic through appliances for security.

Azure API Versions

When you write an ARM template to define and deploy your services, each of the resources in that template is defined against a versioned API. You specify which API version you are using in the template, and different resource providers have different versions.

Azure Stack dos not have all the same versions of the various APIs that are in Azure. Ironically, whilst I have had to make few changes to existing ARM templates in terms of their content in order to successfully use them on Stack, I’ve had to change almost every API version referenced in them. Having said that, I am finding that the API versions I reference for Stack by and large work unchanged if I throw the template at Azure.

Declaring Specific Goals

We’ve discussed our target platforms and talked about how those differ in terms of our deployment configurations. Let’s talk about what our aims were as we embarked on our project to manage VM labs:

  1. All labs should deploy from greenfield. One of our biggest pain points with our old approach was that our labs were built as a collection of VMs. We couldn’t change the name of the AD domain; changing IP address was complex; adding new VMs was painful; patching a ‘new’ environment could take hours.
    We were very clear that we wanted to create all new labs from base media which we would try to keep current for patches (at least within a few months) and would allow us to create any number of machines and environments.
  2. There should be one configuration for each guest VM, which would be used everywhere. We were very clear that we would create one DSC configuration for each role that we needed (for example, a Domain Controller or an ADFS server) and that configuration would be used whether we were creating a lab on a local machine, in Azure or Azure Stack.
  3. Maintain a distinction between a virtual machine configuration and an environment configuration. We are building a collection of virtual Lego with our VM configurations. Our teams can combine those Lego bricks into environments that may be project specific. There should be a configuration for those environments. We should never alter an existing configuration for a new environment – we should create a new configuration using the existing one as a base (for example, we need additional roles on our DC for some reason).
  4. Take a common approach with Lability and Azure, whilst accepting we have to maintain two sets of resources.
    Our approach to Azure environments is already modular. We have templates for VMs that are combined into environments through Nested Deployments. This would not change. Our VM definitions would encompass a DSC configuration and an ARM template. Our environments would include both a DSC data file and an ARM template.
  5. Manage and automated the creation of base media. We would need a variety of base VHD files, analogous to the existing marketplace images in Azure: Windows Server (numerous versions), SQL Server, SharePoint, etc. Each of these must be created using scripts so they could be periodically rebuilt to achieve our goal of avoiding time consuming patching of new environments. In short, we would need an Image Factory.
  6. Setup and use should be straightforward. We need our developers to be able to install all the tooling and get a new lab up and running quickly. We need easy integration with Azure Dev/Test Labs, etc. This would need some process automation around the build and release of the VM configurations and anything else we would create as part of the project.

Things You Will Need

If you want to build the same Lab solution as we did you’re going to need a few things:

  1. Git Repository. All the code and configurations we create are ultimately stored in a central Git Repo. We are using Visual Studio Team Services, as it’s our chosen source control platform.
    Why Git? Two reasons: First of all, it allows us to easily deploy our solution to a developer workstation by simply cloning the repo. Second, Azure DevTest Labs needs a Git Repo to store Artifacts (our ARM templates) for deployment of environments.
  2. Build/Release automation. When we commit to our shared repo, our Build server executes some PowerShell to create deployment artifacts for Azure. It creates Zip archives from our configurations to be used with the DSC extension. It makes no sense to create these by hand and waste space in our repo. Our Release pipeline then automatically pushes our artifacts to an Azure storage account that can be accessed by our developers as a single, central store for VM configurations.
  3. Private PowerShell Repository. We use ProGet to provide a local Nuget/PowerShell/NPM etc repository. We had this in place before we started this project, but it has proved invaluable. The simple reason is that we want to publish DSC Resources or easy consumption and installation by our team. You be surprised at how many times we’ve hit a bug in a DSC resource which has been fixed in the source code repo but a new version has not yet been published. Maintaining our own repository allows us to publish our own versions of DSC resources (and in some case our own bespoke resources).
  4. A server to host your Image Factory. I’m not going to spend time documenting this part of our solution. Far cleverer people than I have written about this and we followed their guidance. You need somewhere to host your images and run the scripts on a schedule to build new ones. Our builds run overnight and we place images on a Windows fileshare.
  5. An Azure subscription. If you want to use the same configuration for on-prem and cloud, saying that you need and Azure sub seems a little obvious. However, we are using nested deployments. These use resources that must be accessible to the Azure fabric at deploy time, and the easiest way to do that is to use Azure Storage. You’ll also need a subscription to host your DevTest lab if that’s your preferred approach. Note that you could have multiple subscriptions – our devs can use their MSDN Azure Benefit to host environments within their own DevTest lab, whilst the artefact store is on a corporate subscription and the artefact repo is in our VSTS.
  6. A code editor that understands PowerShell, DSC and ARM. I prefer Visual Studio and the Azure SDK, but Visual Studio Code is an equally powerful tool for creating and managing the files we are going to use.

Managing our VMs and Environments

After much thought, we came up with a standard folder structure and approach to our VM and environment configurations and the supporting scripts needed to deploy them.

In our code repo we have a the following folder structure:

\Environments

This folder contains a series of folders, one per environment.

This folder is specified as that containing environment templates when the shared repo is connected to an Azure DevTest Lab

\Environment\MyEnv1

An environment folder contains three files:

\Environment\MyEnv1\MyEnv1.psd1

The psd1 data file must share the same name as the folder. This contains all the configuration settings for all VMs in our environment and is used by Lability and the VM DSC configs

\Environment\MyEnv1\azuredeploy.json

For DevTest labs, the environment template used in Azure must be named azuredeploy.json. This template calls a series of other templates to deploy the virtual network and VMs to Azure

\Environment\MyEnv1\metadata.json

This file is read by DevTest labs and provides a name and description for our environment

\VMs

This folder contains subfolders for each of our component Virtual Machines.

\VMs\MyVM1

A VM folder contains at least two files:

\VMs\MyVM1\MyVM1.ps1

The ps1 configuration file must share the same name as the folder. It contains the DSC PowerShell to apply the configuration to the guest VM

\VMs\MyVM1\MyVM1.json

The json file shares the folder name for consistency. It is called by the azuredeploy.json environment template to create the VM in Azure and Azure Stack

\Modules

The Modules folder contains shared code of various types

\Modules\Scripts

The scripts folder contains PowerShell scripts to install and configure our standard Lability deploy, wrapper the Lability create and remove commands and perform build and release tasks.

\Modules\Template

The template folder holds common ARM templates that create standard elements shared between environments and called by the azuredeploy.json

\Modules\DSC

This folder is used during the build process. All the DSC resources needed in an environment are downloaded to this folder. A script parses the VM DSC configurations called by an environment and creates Zip files to be uploaded into Azure storage that contain the correct DSC resources and DSC PowerShell for an environment

Wrapper Scripts for Lability

Lability is great but is built to work in a certain way. We have three scripts that perform key functions for our deployment.

Install Script

Our installation script performs the following function:

  1. Creates the C:\Virtualisation base folder we use to store VMs and the Lability working files.
  2. Sets the default Hyper-V locations for Virtual Machines and Virtual Hard disks to c:\Virtualisation
  3. Creates a new Internal Virtual Switch (named in accordance to our convention) and sets the IP address on the NIC created on the host to the required one. Our first switch creates a network of 192.168.254.0/24 and the host gets 192.168.254.1 as it’s IP address.
  4. Creates a new NetNat with an internal address prefix of 192.168.224.0/19. This will pass traffic into and out of up to thirty /24 subnets starting at 192.168.224.0/24, up to 192.168.254.0/24. We decided to work from the top down when creating new networks.
  5. Makes sure that the Nuget package provider is installed and registers our ProGet server as a new PowerShell repository. We then remove the default PowerShellGallery registration and make sure our repo is trusted.
  6. Check to see if Lability is installed and if not, we install it using Install-Module.
  7. Set the following Lability defaults using the Set-LabHostDefault command:
    ConfigurationPath: c:\Virtualisation\Configuration
    IsoPath: c:\Virtualisation\ISOs
    ParentVhdPath: c:\Virtualisation\MasterVirtualHardDisks
    DifferencingVhdPath: c:\Virtualisation\VMVirtualHardDisks
    ModuleCachePath: c:\Virtualisation\Modules
    ResourcePath: c:\Virtualisation\Resources
    HotfixPath: c:\Virtualisation\Hotfix
    RepositoryUri: <the URI of our ProGet Server, e.g. https://proget.mycorp.com/nuget/PowerShell/package>
  8. Set the default virtual switch for Lability environments to our newly created one using the Set-LabVMDefault command.
  9. Register our VHD base media by calling another script which loads a standard configuration data file. This is separate so we can perform this action independently.
  10. Set the Lability default media to our Windows Server 2012 R2 standard VDH using the Set-LabVMDefault command.
  11. Initialise Lability using our configuration with the Start-LabHostConfiguration command.

Once the install script has completed we have a fully configured host ready to deploy Lability labs.

Deploy-LocalLab script

Lability has a Start-LabConfiguration command which reads the psd1 configuration data file for an environment and creates the VMs. Before running that, however, you need to execute the PowerShell DSC scripts to generate the MOF files for each VM. Lability injects those, and the DSC resources, into the VMs. A second command, Start-Lab boot the VMs themselves, respecting boot order and delays that can be declared in the config file.

This is great unless you have a complex lab and need lots of DSC resources to make it work. Our wrapper script does the following, taking an environment name as a parameter:

  1. Reads the psd1 data file for our environment from the correct folder to identify the DSC resources we need (they are listed for Lability). It installs these resources so we can execute the PowerShell configuration scripts and generate the MOFs.
  2. Reads the psd1 data file to identify the VMs we are deploying. Based on the Role information in that file it will execute each of the configuration ps1 files from the VMs folder hierarchy, passing in the psd1 data file. The resultant MOFs get saved in the Lability configuration folder (c:\Virtualisation\Lability).
  3. Execute the Start-LabConfiguration command passing in the configuration data file.
  4. If we specify a -Start switch, the script starts the lab with the Start-Lab command.

Remove-LocalLab script

Our remove script takes the name of our environment as a parameter. It does the following:

  1. Identifies the VMs in the lab using the Get-LabVM command, passing in the psd1 data file. Check to see if any are running and if they are call the Stop-Lab command.
  2. Executes the Remove-LabConfiguration command, passing in the psd1 data file for the environment.

Virtual Machine Configuration

We’ve challenged ourselves to only use Desired State Configuration for our VMs. This has been a big change from our previous approach to Azure VMs, which mixed DSC with custom PowerShell scripts deployed with a separate Azure VM extension. This has raised four issues we had to solve:

  1. The list of DSC Resources is growing but not all-encompassing. There are many areas where no DSC modules exist. To overcome this, we have used a mix of SetScript code contained within a DSC configuration (which has some limitations) and bespoke DSC modules hosted in our ProGet repository.
  2. Existing Published DSC resources may contain bugs. In many cases code fixing those bugs has been supplied as pull requests but may be undergoing review, and sometimes no new release of the resource has been created. We now have our own separate code repository for DSC resources (including our own) where we keep these and we publish versions to our own repository. When a new official version including the fixes is released it will supersede our own.
  3. There are some good DSC resources out there on GitHub that aren’t published to the PowerShell gallery. We publish these into our own repository for access.
  4. Azure executes the DSC on the target VM to generate the MOF. Lability executes it on the host machine. That and other differences means that we have wrapper code to switch the config sections, mostly based on an input parameter named IsAzure. When called from the Azure DSC extension we specify that parameter and on a Lability host we don’t. I realise that purists will argue that this means we don’t really have a single configuration. I would counter that I have a single configuration file and therefore one thing to maintain. I don’t see any issue with logic inside that config deciding what happens.

Sample Configuration

Let’s illustrate our approach with an extract from a configuration. The code below is part of our DomainController config.

The config accepts some parameters. EnvPrefix is used to generate names within the environment. In Azure we use it to prefix our Azure resources. Within the environment it’s used to create things like the AD domain name. IsAzure tells the config whether it is being executed on the host or on the target VM inside Azure.

You’ll notice that we specify the DSC module versions. There are a few reasons why we do this – because some of the DSC resources are unofficial we want to make sure they come from our repository, and the way Lability downloads DSC resources from our ProGet Server means we need to specify a version number. Either way, we benefit from increased consistency – there have been some breaking changes between versions with the official DSC resources in the PowerShell Gallery!

If we’re in Azure we do things like find the network adapter through code and we don’t specify network addresses. We use the IsAzure parameter to wrapper this stuff in If blocks.

The configuration values come from the psd1 data file, regardless of whether we deploy to Azure or locally. We do this to enforce consistency. Even though we probably could have the Azure config self-contained in the script, we don’t.

 

Configuration DomainController {

    param(
        [ValidateNotNull()]
        [System.Management.Automation.PSCredential]$Credential,

        [string]$EnvPrefix,

        [bool]$IsAzure = $false,

        [Int]$RetryCount = 20,
        [Int]$RetryIntervalSec = 30
    )

    Import-DscResource -ModuleName @{ModuleName="xNetworking";ModuleVersion="3.2.0.0"}
    Import-DscResource -ModuleName @{ModuleName="xPSDesiredStateConfiguration";ModuleVersion="6.0.0.0"}
    Import-DscResource -ModuleName @{ModuleName="xActiveDirectory";ModuleVersion="2.16.0.0"}
    Import-DscResource -ModuleName @{ModuleName="xAdcsDeployment";ModuleVersion="1.1.0.0"}
    Import-DscResource -ModuleName @{ModuleName="xComputerManagement";ModuleVersion="1.9.0.0"}

    $DomainName = $EnvPrefix + ".local"

    Write-Verbose "Processing Configuration DomainController"

    Write-Verbose "Processing configuration: Node DomainController"
    node $AllNodes.where({$_.Role -eq 'DomainController'}).NodeName {
        Write-Verbose "Processing Node: $($node.NodeName)"

        if ($IsAzure -eq $true) {
            #Find the first network adapter
            $Interface = Get-NetAdapter | Where-Object Name -Like "Ethernet*" | Select-Object -First 1
            $InterfaceAlias = $($Interface.Name)
        }
        
        LocalConfigurationManager {
            RebootNodeIfNeeded = $true;
            AllowModuleOverwrite = $true;
            ConfigurationMode = 'ApplyOnly'
            CertificateID = $node.Thumbprint;
            DebugMode = 'All';
        }

        #ignore this is in Azure
        if ($IsAzure -eq $false) {
            # Set a fixed IP address if the config specifies one
            if ($node.IPaddress) {
                xIPAddress PrimaryIPAddress {
                    IPAddress = $node.IPAddress;
                    InterfaceAlias = $node.InterfaceAlias;
                    PrefixLength = $node.PrefixLength;
                    AddressFamily = $node.AddressFamily;
                }
            }
        }


        #ignore this is in Azure
        if ($IsAzure -eq $false) {
            # Set a default gateway if the config specifies one
            if ($node.DefaultGateway){
                xDefaultGatewayAddress DefaultGateway {
                    InterfaceAlias = $node.InterfaceAlias;
                    Address = $node.DefaultGateway;
                    AddressFamily = $node.AddressFamily;
                }
            }
        }

        # Set the DNS server if the config specifies one
        if ($IsAzure -eq $true) {
            if ($node.DnsAddress){
                xDNSServerAddress DNSaddress {
                    Address = $node.DnsAddress;
                    InterfaceAlias = $InterfaceAlias;
                    AddressFamily = $node.AddressFamily;
                }
            }
        } 
        else {
            if ($node.DnsAddress){
                xDNSServerAddress DNSaddress {
                    Address = $node.DnsAddress;
                    InterfaceAlias = $node.InterfaceAlias;
                    AddressFamily = $node.AddressFamily;
                }
            }
        }
            
    }

#End configuration DomainController
}

Sample Data File

Below is a sample data file for an environment containing a Domain Controller and single domain-joined server. Note that the data file contains a mix of data to be processed by the DSC configuration and Lability-specific information that defines the environment, including VM settings and the required DSC resources. When we deploy the lab locally, Lability processes the file to create the Virtual Machines and their hard disks (and create new virtual switches if we declare them). When we deploy in Azure this information is ignored – we can safely use the same data file in both situations.

# Single Domain Controller Lab

@{
    AllNodes = @(
        @{
            # DomainController
            NodeName = "DC";
            Role = 'DomainController';
            DSdrive = 'C:';
            
            #Prevent credential error messages
            PSDscAllowPlainTextPassword = $true;
            PSDscAllowDomainUser = $true;


            # Networking
            IPAddress = '192.168.254.2';
            DnsAddress = '127.0.0.1';
            DefaultGateway = '192.168.254.1';
            PrefixLength = 24;
            AddressFamily = 'IPv4';
            DnsConnectionSuffix = 'lab.local';
            InterfaceAlias = 'Ethernet';


            # Lability extras
            Lability_Media = 'BM_Server_2012_R2_Standard_x64';
            Lability_ProcessorCount = 2;
            Lability_StartupMemory = 2GB;
            Lability_MinimumMemory = 1GB;
            Lability_MaximumMemory = 3GB;
            Lability_BootOrder = 0;
            Lability_BootDelay = 600;
        };
        @{
            # MemberServer
            NodeName = "SR01";
            Role = 'MemberServer';
            DSdrive = 'C:';
            
            #Prevent credential error messages
            PSDscAllowPlainTextPassword = $true;
            PSDscAllowDomainUser = $true;


            # Networking
            IPAddress = '192.168.254.3';
            DnsAddress = '192.168.254.2';
            DefaultGateway = '192.168.254.1';
            PrefixLength = 24;
            AddressFamily = 'IPv4';
            DnsConnectionSuffix = 'lab.local';
            InterfaceAlias = 'Ethernet';


            # Lability extras
            Lability_Media = 'BM_Server_2012_R2_Standard_x64';
            Lability_ProcessorCount = 2;
            Lability_StartupMemory = 2GB;
            Lability_MinimumMemory = 1GB;
            Lability_MaximumMemory = 3GB;
            Lability_BootOrder = 1;
        };

    );

    NonNodeData = @{
        OrganisationName = 'Lab';

        Lability = @{
            EnvironmentPrefix = 'Lab-';

            DSCResource = @(
                @{ Name = 'xNetworking'; RequiredVersion = '3.2.0.0';}
                @{ Name = 'xPSDesiredStateConfiguration'; RequiredVersion = '6.0.0.0';}
                @{ Name = 'xActiveDirectory'; RequiredVersion = '2.16.0.0';}
                @{ Name = 'xAdcsDeployment'; RequiredVersion = '1.1.0.0';}
                @{ Name = 'xComputerManagement'; RequiredVersion = '1.9.0.0';}
            );
        }

    };
};

Azure DSC Extension

Our Azure deployment uses the configuration and data file to configure the VM. The JSON for the DSC extension is shown below. Notice the following:

1. The modulesUrl setting specifies a Zip file that contains the DSC resources and configuration ps1 file. We create these zip files as part of our build process and upload them to an Azure storage account.

2. The configurationFunction setting specifies the name of the ps1 file to execute and the configuration within that we want to apply (a single file can contain more than one configuration, although ours don’t).

3. We pass in the EnvPrefix variable and set the IsAzure value to 1 so our configuration executes the right code.

4. The dataBlobUri within protectedSettings is our psd1 data file. The extension treats this as containing sensitive information – things held in this section are not displayed in any output from Azure Resource Manager.

In fairness, whilst at the moment we create JSON specific to each VM, I plan to refactor this to be common code that takes parameters rather than having an ARM template for each VM’s DSC.

      {
        "name": "[concat(parameters('envPrefix'),parameters('vmName'),'/',parameters('envPrefix'),parameters('vmName'),'dsc')]",
        "type": "Microsoft.Compute/virtualMachines/extensions",
        "location": "[parameters('VirtualNetwork').Location]",
        "apiVersion": "[parameters('ApiVersion').VirtualMachine]",
        "dependsOn": [
        ],
        "tags": {
          "displayName": "DomainController"
        },
        "properties": {
          "publisher": "Microsoft.Powershell",
          "type": "DSC",
          "typeHandlerVersion": "2.1",
          "autoUpgradeMinorVersion": true,
          "settings": {
            "modulesUrl": "[concat(parameters('artifactsLocation'), '/Environments/', parameters('envConfig'),'/',parameters('envConfig'),'.zip', parameters('artifactsSasToken'))]",
            "configurationFunction": "DomainController.ps1\\DomainController",
            "properties": {
              "EnvPrefix": "[parameters('EnvPrefix')]",
              "Credential": {
                "userName": "[parameters('adminUsername')]",
                "password": "PrivateSettingsRef:adminPassword"
              },
              "IsAzure": 1
            }
          },
          "protectedSettings": {
            "dataBlobUri": "[concat(parameters('artifactsLocation'), '/Environments/', parameters('envConfig'), '/', parameters('envConfig'),'.psd1', parameters('artifactsSasToken'))]",
            "Items": {
              "adminPassword": "[parameters('adminPassword')]"
            }
          }
        }
      }

We don’t include the DSC extension within the ARM template that deploys the VM because by doing so we can sequence the deployment of configuration to deal with dependencies between servers.

Azure ARM Templates

The approach we take to deploying VMs in Azure has been consistent for some time now. My ResourceTemplates Repo in GitHub uses nested templates to deploy a three-server environment and we use exactly the same approach here. Our ‘master template’ is stored in the environment folder and it calls nested deploys for each VM, VM DSC extension and supporting stuff such as virtual networks. The VM and DSC templates are stored in the VM folder with the DSC config, and the supporting templates are in our Modules\Templates folder since they are shared.

Conclusion

This has been a very long article without a great deal of code in it. I hope this explains how we approach our environment definition and deployment. I plan to do more posts that document more specific elements of a configuration or an environment.

Ultimately, I’m not sure that the goal of a single definition that covers multiple platforms and both host and guest configurations exists. However, I think we’ve got pretty close with our solution and it has minimal rework involved, particularly once you have built up a good library of VM configs that you can combine into an environment.

I should also point out that we are not installing apps – we are deploying a platform onto which our developers and testers can then install the applications they develop. This means that we keep the environments quite generic. Deployment of apps is still scripted (and probably uses VSTS Release Management) but is not included in the configurations we build. Having said that, there is nothing stopping a team extending the DSC to deploy their applications and thus build a more bespoke definition.

I’ve spoken to quite a few people about what we’ve done over the past few weeks and, certainly within the Microsoft space many people want to do what we have done, but few were aware that tooling such as Lability and DSC were available to get it done. I hope this goes some way to plugging that gap.

Unblocking a stuck Lab Manager Environment (the hard way)

This is a post so I don’t forget how I fixed access to one of our environments yesterday, and hopefully it will be useful to some of you.

We have a good many pretty complex environments deployed to our lab hyper-V servers, controlled by Lab manager. Operations such as starting, stopping or repairing those environments can take a long, long time, but this time we had one that was quite definitely stuck. The lab view showed the many servers in the lab with green progress bars about halfway across but after many hours we saw no progress. The trouble is, at this point you can’t issue any other commands to the environment from within the Lab Manager console – it’s impossible to cancel the operation and regain access to the environment.

Normally in these situations, stepping from Lab Manager to the SCVMM console can help. Stopping and restarting the VMs through SCVMM can often give lab manager the kick it needs to wake up. However, this time that had no effect. We then tried restarting the TFS servers to see if they’d got stuck, but that didn’t help either.

At this point we had no choice but to roll up our sleeves and look in the TFS database. You’d be surprised (or perhaps not) at how often we need to do that…

First of all we looked in the LabEnvironment table. That showed us our environment, and the State column contained a value of Repairing.

Next up, we looked in the LabOperation table. Searching for rows where the DataspaceId column value matched that of our environment in the LabEnvironment table showed a RepairVirtualEnvironment operation.

In the tbl_JobSchedule table we found an entry where the JobId column matched the JobGuid column from the LabOperation table. The interval on that was set to 15, from which we inferred that the repair job was being retried every fifteen minutes by the system. We found another entry for the same JobId in the tbl_JobDefinition table.

Starting to join the dots up, we finally looked in the LabObject database. Searching for all the rows with the same DataspaceId as earlier returned all the lab hosts, environments and machines that were associated with the Team Project containing the lab. In this table, our environment row had a PendingOperationId which matched that of the row in the LabOperation table we found earlier.

We took the decision to attempt to revive our stuck environment by removing the stuck job. That would mean carefully working through all the tables we’d explored and deleting the rows, hopefully in the correct order. As the first part of that, we decided to change the value of the State column in the LabEnvironment table to Started, hoping to avoid crashing TFS should it try to parse all the information about the repair job we were about to slowly remove.

Imagine our surprise, then, when having made that one change, TFS itself cleaned up the database, removed all the table entries referring to the repair environment job and we were immediately able to issue commands to the environment again!

Our TFS Lab Management Infrastructure

Richard and I spend a good deal of time talking about Lab Manager and our environments. I’ve written here before about our migration to the latest versions of the various components of Lab and both Richard and I have delivered sessions at user groups and conferences.

Richard was in Belgium last week for Techorama, after which he was asked about the specifics of our setup. Between us, we came up with a diagram of our Lab Environment and Richard recently posted that to his blog. Hopefully some of you will find it useful.

Migrating to SCVMM 2012 R2 in a TFS Lab Scenario

Last week I moved our SCVMM from 2012 with service pack 1 to 2012 R2. Whilst the actual process was much simpler than I expected, we had a pretty big constraint imposed upon us by Lab Manager that largely dictated our approach.

Our SCVMM 2012 deployment was running on an aging Dell server. It had a pair of large hard drives that were software mirrored by the OS an we were using NIC teaming in Server 2012 to improve network throughput. It wasn’t performing that well, however. Transfers from the VMM library hosted on the server to our VM hosts were limited by the speed of the ageing SATA connectors and incoming transfers were further slowed by the software mirroring. We also had issues where Lab manager would timeout jobs whilst SCVMM was still diligently working on them.

Our grand plan involves migrating our VM hosts to Server 2012 R2. That will give us better network transfers of VMs and allow generation 2 VMs on our production servers (also managed by SCVMM). To get there we needed to upgrade SCVMM, and to do that we had to upgrade our Team Foundation Server. Richard did the latter a little while ago, which triggered the process of SCVMM upgrade.

Our big problem was that Lab is connected extremely strongly to SCVMM. We discovered just how strongly when we moved the SCVMM 2012. If we changed the name of the SCVMM server we would have to disconnect Lab from SCVMM. That would mean throwing away all our environments and imported machines, and I’m not going through the pain of rebuilding all that lot ever again.

I desperately wanted to move SCVMM onto better tin – more RAM, more cores and, importantly, faster disks and hardware mirroring. That led to a migration process that involved the following steps:

  1. Install Server 2012 R2 on our new server. Configure storage to give an OS drive and a data drive for the SCVMM library.
  2. Install the SCVMM pre-requisites on the new server.
  3. Using robocopy, transfer the contents of the SCVMM library to the new server. This needed breaking into blocks as we use data deduplication, and our library share contents are about three times the size of the drive! We could repeat the robocopy script and it would transfer any updated files.
  4. Uninstall SCVMM 2012 from the old server, making sure to keep the database as we do so.
  5. Change the name of the old server, and it’s IP address.
  6. Change the name of the new server to that of the old one, and change the IP address.
  7. Install SCVMM 2012 R2 onto the new server.

Almost all of that worked perfectly. When installing SCVMM onto the new server I wanted to use an existing share for the library, sat on drive d: and called MSCVMMLibrary. Setup refused, saying that the server I was installing to already had a share of that name, but on drive c:. Very true – for various reasons the share was indeed on the c: drive, albeit with storage on a separate partition attached with a mount point.

What to do – I couldn’t remove the existing share as I didn’t have SCVMM installed. I didn’t want to roll back either, as the steps were painful enough to deter me. So I looked in the SCVMM database for the share.

Sure enough, there is a table in there that lists the paths for the library shares for each server (tbl_IL_LibraryShare). There was a row with the name of my SCVMM server and a c:\mscvmmlibrary path for the share. I changed the ‘c’ to a ‘d’ and reran setup. It worked like a charm.

Now, I would not recommend doing what I did, but in the Lab Manager scenario, removing and re-adding that share causes all kinds of trouble as the resources in the library are connected to lab environments. I haven’t had any problems post-upgrade, so it looks like I got away with it. Sadly, this is just another in a long list of issues with the way Lab Manager interacts with SCVMM.

Unexpectedly now doing a session at DDD North 2013

I had a surprise exchange of text messages last night with Andy Westgarth. Sadly, one of the people who was to speak in one of the first session slots has had to pull out. Andy did the thing all the best conference organisers do – he called his friends! As a result, Richard and myself will be presenting a session about our experience with Lab Manager on Saturday morning.

Lab Manager is an interesting part of the development puzzle, allowing automated provisioning of environments that can then have software deployed to them and automated tests run against them. However, building a good Lab Manager environment (or machines to then be composed into an environment) is a very different task than the bare-metal scripting guerrilla devops approach that is very en-vogue right now. Richard and I will speak about how we run our Lab from both the perspective of the development/ALM specialist (that would be him!) and the IT guy (that would be me!).

I’ve also been asked to take lots of photos at the event, so if you see me wandering around with my camera, smile and say hi!

Building environments for Lab Manager: Why bare metal scripting fails

In the world of DevOps it’s all about the scripts: I’ve seen some great work done by some clever people to create complex environments with multiple VMs all from scratch using PowerShell. That’s great, but unfortunately in the world of Lab Manager it just doesn’t work well at all.

We’ve begun the pretty mammoth task of generating a new suite of VMs for our Lab Manager deployment to allow the developers and testers to create multi-machine environments. I had hoped to follow the scripting path and create these things much more on the fly, but it wasn’t to be.

I hope to document our progress over the next few weeks. This post is all about the aims and the big issues we have that make us take the path we are following.

Needs and Wants

Let’s start with our requirements:

  • A flexible, multi-server environment with a range of Microsoft software platforms to allow devs to work on complex projects.
  • All servers must be part of the same domain.
  • All products must be installed according to best practice – no running SharePoint as local service here!
  • Multiple versions of products are needed: SharePoint 2010 and 2013; CRM 4 and 2011; SQL 2008 R2 and 2012; Biztalk 2010 and 2013.
  • ‘Flexible’ VMs running IIS or bare server 2008 R2/2012 are needed.
  • No multi-role servers. We learned from past mistakes: Don’t put SQL on the DC because Lab Manager Network Isolation causes trouble.
  • Environments must only consist of the VMs that are needed.
  • Lab Manager should present these as available VMs that can be composed into an environment: No saving complete environments until they have been composed for a project.
  • Developers want to be able to run the same VMs locally on their own workstations for development; Lab Environments are for testing and UAT but we need consistency across them all.

That’s quite a complex set of needs to meet. What we decided to build was the following suite of VMs:

  • Domain Controller. Server 2012.
  • SQL 2012 DB server. Server 2012.
  • SharePoint 2013 (WFE+APP on one box). Server 2012. Uses SQL 2012 for DB.
  • Office Web Apps 2013. Server 2012.
  • Azure Workflow Server (for SharePoint 2013 workflows). Server 2012.
  • CRM 2011 Server. Server 2012. Users SQL 2012 for DB.
  • Biztalk 2013 Server. Server 2012. Users SQL 2012 for DB.
  • IIS 8 server. Server 2012.
  • ‘Flexible’ Server 2012. For when you just want a random server for something.
  • SQL 2008 R2 server. Server 2008 R2.
  • SharePoint 2010 (WFE+APP+OWA on one box). Server 2008 R2. Uses SQL 2008 R2 for DB.
  • CRM 4. Server 2008 R2. Uses SQL 2008 R2 for DB.
  • Biztalk 2010. Server 2008 R2. Uses SQL 2008 R2 for DB.
  • IIS 7.5 server. Server 2008 R2.
  • ‘Flexible’ Server 2008 R2.

In infrastructure terms we end up with a number of important elements:

  • Our AD domain and DNS domain: <domain>.local.
  • For SharePoint 2013 Apps we need a different domain. For ease this is apps.<domain>.local.
  • To ensure we can use SSL for our web sites we need a CA. This is used to issue device certs and web certs. For simplicity a wildcard cert (*.<domain>.local) is issued.
  • Services such as SharePoint web applications all get DNS registrations. Each service gets an IP address and these addresses are bound to servers in addition to their primary IPs.
  • All services are configured to work on our private network. If we want them to work on the public network (Lab machines, excluding the DNS, can have multiple NICs and multiple networks) then we’ll deal with that once the environment is composed and deployed through Lab Manager.

Problems and constraints

The biggest problem with Lab Manager is the way Network Isolation works. Lab asks SCVMM to deploy a new environment. If network isolation is required (because you are deploying a DC and member servers more than once through many copies of the same servers) then Lab creates a new Hyper-V virtual network (named with GUID) and connects the VMs to that. It then configures static addresses on that network for the VMS. It starts with the DC and counts up.

My experience is that trying to be clever with servers that are sysprepped and then run scripts simply confuse the life out of Lab. You really need your VMs to be fully working and finished right out of the gate. Unfortunately, that mans building the full environment by hand, completely, and then storing them all with SCVMM before importing each VM into Lab.

There are still a few wrinkles that I know we have to iron out,even with this approach:

  • In the wonderful world of the SharePoint 2013 app model we need subdomains and wildcard DNS entries. We also need multiple IP addresses on the server. Right now we haven’t tested this with lab. What we are hoping is that we can build our environment on the correct address space as Lab uses. It counts up from 1, so our additional IPs will count down from 254. What we don’t know is whether Lab will remove all the IP addresses from the NICs when it configures the machines. If it does, then we will need to have some powershell that runs to configure the VMs correctly.
  • Since we don’t have to use all the VMs we have built in any given environment, DNS registration becomes important. Servers should register themselves with the DNS running on our DC, but we need to make sure we don’t have incorrect registrations hanging around.

Both of these areas will have to be addressed during this week, so I’ll post an update on how we get on.

Consistency is still key

Even though we can’t use scripts to create our environment from bare metal, consistency is still really important. We are, therefore, using scripts to ensure that we are following a set of fixed, replicable steps for each VM build. By leaving the scripts on the VM when we finished, we also have some documentation as to what is configured. We are also trying, where possible, to ensure that if a script is re-run it won’t cause havoc by creating duplicate configurations or corrupting existing ones.

Each of our VMs has been generated in SCVMM using templates we built for the two base operating systems. That avoids differences in OS install and allows me to get a new VM running in minutes with very little involvement. By scripting our steps, should things go badly wrong we can throw away a VM and run through those steps again. It’s getting trickier as we move forward, though; rebuilding our SQL boxes once we have SharePoint installed would be a pain.

Frustration begets good practice

In fact, one of the most useful things that has come out of this project so far is a growing set of robust powershell modules that perform key functions for us. They are things that we have scripted already, but in the past these have been scripts we have edited and run manually as part of install procedures. Human intervention meant we created simpler scripts. This week I have been shifting some of the things those scripts do into functions. The functions are much more complex, as they carefully check for success and failure at every step. However, the end result is a separation of the function that does the work and the parameters that change from job to job. The scripts will will create for any given installation now are simpler and are paired with an appropriate module of functions.

Deciding where to spend time

Many will read this blog and raise their hands in despair at the time we are spending to build this environment. Surely the scripted approach is better? Interestingly, our developers would disagree. They want to be able to get a new environment up and running quickly. The truth is that our Lab/SCVMM solution can push a new multi-server rig out and have it live and usable far quicker than we could do with bare metal scripts. The potential for failure using scripts if things don’t happen in exactly the right order is quite high. More importantly, if devs are sat on their hands then billable time is being wasted. Better to spend the time up front to give us an environment with a long life.

Dev isn’t production, except it is, sort of…

The crux of this is that we need to build a production-grade environment. Multiple times. With a fair degree of variation each time. If I was deploying new servers to production my AD and network infrastructure would already be there. I could script individual roles for new servers. If I was building a throwaway test or training rig where adherence to best practice wasn’t critical then I could use the shortcuts and tricks that allow scripted builds.

Development projects run into difficulties when the dev environment doesn’t match the rigor of production. We’ve had issues with products like SharePoint, where development machines have run a simple next-next-finish wizard approach to installation. Things work in that kind of installation that fail in a production best practice installation. I’m not jumping for joy over the time it’s taking to build our new rigs, but right now I think it’s the best way.

Speaking at NEBytes about TFS 2012 Lab and SCVMM 2012

On Wednesday 15th May 2013, Black Marble travels north, as Steve Spencer and I will both present sessions for the great guys at NEBytes.

Whilst Steve covers fun hardware and software dev using Gadgeteer, I will be talking about our experiences with TFS 2012 Lab and SCVMM 2012.

If you have seen some of my earlier posts, our migration to the latest and greatest was interesting, to say the least. I learned a great deal about how SCVMM and Lab talk to each other and I will be running through how we built our environment and the things we learned that could save you pain as you follow in our footsteps.

I always enjoy speaking at NEBytes and I’m looking forward to seeing everyone next week!

Fixing Lab Manager environments with brute force

As you’ve probably seen, our Lab Manager/SCVMM 2008 R2 upgrade to SCVMM 2012 SP1 was not the smoothest in the world. The end result was a clean lab manager and SCVMM install, but a raft of virtual machines that had previously been part of environments.

In tidying up, Richard and I learned a few things about picking apart VMs that were once part of an environment such that a new environment could be built form the wreckage.

There are two approaches to getting what you need: Firstly, you could simply compose the existing virtual machines into a new environment without storing in, and deploying from SCVMM. Secondly, you could pull the VMs back into SCVMM such that you could build a new environment.

Don’t forget to fix the networks

If you want to use the running VMs you will need to make sure that you have recreated any private network generated by Lab Manager. These are all helpfully listed in the XML configuration file of the VMs. They are normally named Lab_<GUID>_NI so are easy to find in the file. On the hyper-v host, using hyper-v manager you will need to create a new private virtual network with the name you just found. You should then attach the synthetic network adapter of your VMs (not the legacy network adapter) to this private network. If you have a DC, and you told Lab Manager it was a DC, then you are likely to need to hook its legacy adapter to the private network as well.

Scenario 1: Pull existing machines into an environment

The big problem you are likely to find here is that whilst you have imported the VMs onto your hyper-v server and SCVMM can see the machines just fine, Lab Manager refuses to show them to you.

The reason for this is that Lab Manager believes the VMs are currently part of an environment, just not one it currently has. It therefore hides the VMs from you. It turns out that this is pretty straightforward to fix. In the notes field of the running VM settings you will see a block of XML. That is read by Lab Manager to identify the VMs in environments. Simply delete that xml and the machine will now show up in Lab Manager as being available to compose into an enviroment.

Scenario 2: Get the VMs back into SCVMM to build a new environment and deploy it.

This is a trickier situation and one which needs to follow the steps I talked about in my previous post about building VMs for Lab Manager.

The problem here is not just the XML, but that Lab Manager has probably mangled the hardware settings of the VM as well. You will need to tidy each VM before storing it in SCVMM ready for Lab Manager:

  • Remove the XML from the notes field.
  • Remove the legacy network adapter.
  • Configure the network adapter within windows to use an IP address and DNS handed to it from DHCP.
  • Delete any snapshots.
  • Make sure you cleanly shut down the VM – don’t save it!

If you follow those steps you can store the VMs back into SCVMM then build a new environment from the stored VMs. If this still gives you trouble then you should export the VMs from hyper-v, reimport them as a copy to get a new unique ID and then push those into SCVMM.

So far this has worked just fine for us with Richard working his magic in Lab Manager whilst I fix up VMs in hyper-v and SCVMM.

Things to remember when building virtual machines for a lab manager environment

As you will have read on both mine and Richard’s blogs, we have recently upgraded our Lab environment and it wasn’t the smoothest of processes.

However, as always it has been a learning experience and this post is all about building VM environments that can be sucked into Lab and turned into a Lab environment that can be pushed out multiple times.

Note:  This article is all about virtual machines running on Windows Server 2012 that may have been built on Windows 8 and are managed by SCVMM 2012 SP1 and Lab Manager/TFS 2012 CU1. Whilst the things I have found in terms of prepping VMs for Lab Manager are likely to be common to older versions, your mileage may vary.

Approaches to building environments

There are a number of approaches to building multi-machine environments that developers can effectively self-serve as required:

  • The ALM Rangers have a VM Factory project on Codeplex which aims to deliver scripted build-from-scratch on demand.
  • SCVMM has templates for machines that are part-built and stored after running sysprep. Orchestrator can then be used to deploy templates and run scripts to wire them together.
  • Lab Manager allows you to take running VMs and group them together into an environment. It stores all the VMs in SCVMM and when requested, generates new VMs by copying the ones from the library.

Trouble at ‘mill

There are also a number of problems in this space that must balance the needs of IT pros with the needs of developers:

  • Developers are an impatient bunch. They will request the environment at the last minute and need it deployed as quickly as possible. This doesn’t necessarily work well with complete bare-metal scripted approaches.
  • Developers would also prefer some consistency – if they have to remember one set of credentials it’s probably too much. Use different accounts and passwords and machine names for all your environments and it can get trick.
  • Developers love to use the Lab Manager and Test Manager tooling. This delivers great integration with the Team Project in Team Foundation Server.
  • IT Pros need to deal with issues caused by multiple machines with the same identities sharing a network. This is especially true of domain controllers.
  • IT pros would like to keep the number of snapshots (SCVMM checkpoints) to a minimum, especially when memory images are in play as well.
  • IT pros would prefer the environments used by the developers to match the way things are installed in the real world. This is less critical for the actual development environment but really important when it comes to testing. This tends to lead to requirements for additional DNS entries and multiple user accounts. This is especially true if you are building SharePoint farms properly.

How IT pros would do it…

Let’s use one of our environments as an example. We have a four server set:

  1. The Domain Controller is acting as DNS and also runs SQL Server. It doesn’t have to do the latter, but we were trying to avoid an additional machine. Reporting services and analysis services are installed and reporting services is listening on a host header with a DNS CNAME entry for it.
  2. An IIS server allows for deployment of custom web apps.
  3. A CRM 2011 server is using the SQL instance on the DC for its database and reporting services functions. The CRM system itself is published on another host header.
  4. A SharePoint 2010 server is using the SQL instance as well. It has separate web applications for intranet and mysites and each is published on a separate host header.

If we were building this without lab manager then we would give the machines two NICs. One would be on our network and the other on a private network. On the DC we unbind the nasty windows protocols from our network. Remote desktop is enabled on all machines for the devs to access it.

Lab Manager complicates matters however. It is clever enough to understand that we might need to keep DC traffic away from our network and has a mechanism to deliver this, called Network Isolation. How it actually goes about that is somewhat problematic, however.

Basically, Lab Manager wants to control all the networking in the new environment. To do that it adds new network adapters to the VMs and it uses those new adapters to connect to the main network. It expects a single adapter to be in the original VM, which it connects to a new private network that it creates.

Did I mention that IT pros hate GUIDs? Lab Manager loves them. Whilst I can appreciate that it’s the best way to generate unique names for networks and VMs it’s a complete pain to manage.

Anyway, it’s really, really easy to confuse Lab Manager. Sadly, if the IT pro builds what they consider to be a sensible rig, that will confuse Lab Manager right away. The answer is that we need to build our environment the right way and then trim it in readiness for the Lab Manager bit.

Building carefully

I would build my environment on my Windows 8 box. I create a private network and use that as a backbone for the environment. I assign fixed IP addresses to each server on that network. Each server uses the DC as its DNS. That way I can ensure everything works during build. I also add a second NIC to each box that is connected to my main network. I carefully set the protocols that are bound to that NIC. Both of those network adapters are what lab manager calls ‘synthetic’ – they are the native virtualised adapter hyper-v uses, not the emulated legacy adapter.

I carefully make sure that all host header-required DNS entries are created as CNAMEs that point to the host record for the server I need. This is important because all the IP addresses will change when Lab Manager takes over.

I may make snapshots as I build so I can move back in time if something goes wrong.

When built, I will probably store my working rig so I can come back to it later. I will then change the rig, effectively breaking it, in order to work with Lab Manager.

The Lab Manager readiness checklist

  • Lab Manager will fail if there is more than a single network adapter. It must be a synthetic adapter, not a legacy one. The adapter should be set to use DHCP for all its configuration – address and DNS.
  • Install, but do not configure the Visual Studio Test Agent before you shut the machines down. We’ve seen Lab fail to install this many times, but if it’s already there it normally configures it just fine.
  • Delete all the snapshots for the virtual machine. Whilst Lab Manager can cope with snapshots, both it and SCVMM get confused when machines are imported with different configurations in the snapshots from the final configuration. It will stop Lab Manager in its tracks.
  • Make sure there is nothing in the notes field of the VM settings. Both Lab Manager and SCVMM shove crap in there to track the VM. If anybody from either team is listening, this is really annoying and gets in the way of putting notes about the rigs in there. Lab Manager shoves XML in there to describe the environment.
  • Make sure there are no saved states. Your machines need to be shut down properly when you finish, before importing into SCVMM. The machines need to boot clean or they will get very confused and Lab Manager may struggle to make the hardware changes.
  • Make sure you export the machines – don’t just copy the folder structure, even though its much easier to do.

Next, get it into SCVMM

There is a good reason to export the VMs. It turns out that SCVMM latches on to the unique identifier of the VM (logical, if you think about it). The snag with this is that you can end up with VMs ‘hiding’. If I copy my set of four VMs to an SCVMM library share I can’t have a copy running as well. Unless you do everything through SCVMM (and for many, many reasons I’m just not going to!) you can end up with confusion. This gets really irritating when you have multiple library shares because if you have copies of a VM in more than one library, one will not appear in the lists in SCVMM. There are good reasons why I might want to store those multiple copies.

Back to the plot. SCVMM won’t let us import a VM. We can construct a new one from a VHD but I have yet to find a way to import a VM (why on earth not? If I’ve missed something please tell me!). So, we need to import our VMs onto a server managed by SCVMM. We have a small box for just this purpose – it’s not managed by Lab Manager but is managed by our SCVMM so I can pull machines from it into the library.

Import the VMs onto your host using Hyper-V manager. Make sure you create sensible folder structures and names for them all. Once they are imported make sure you close hyper-v manager. I have seen SCVMM fail to delete VM folders correctly because hyper-v manager seems to have the VHD open for some reason.

In SCVMM, refresh the host you’ve just imported the VMs to. You should see them in the VM list. I tend to refresh the VMs too, but that’s just me. Start the VMs and let SCVMM get all the information from them like host name etc. I usually leave them for a few minutes, then shut them down cleanly from the SCVMM console.

Now we know SCVMM is happy with them, we can store the VMs in the SCVMM library that Lab Manager uses. You should see them wink out existence on the VM host once the store is complete.

Create the Lab environment

At this point the IT guys can hand over to the people managing labs. In our case that’s Richard. He can now compose a new environment within Lab Manager and pull the VMs I have just stored into his lab. He tells the lab that it needs to run with network isolation and identifies the DC.

What Lab Manager will then do is deploy a new VM through SCVMM using the ones I built as a source. It will then modify the hardware configuration of the VMs, adding a legacy network adapter. It also configures the MAC address of the existing synthetic adapter to be static.

A new private virtual network is created on the target VM host. It’s really hard to manage these through SCVMM so if Lab ever leaves them hanging around I delete them using hyper-v manager. The synthetic adapters in the VMs are connected to the private network while the legacy adapters are connected to the main network.

Exactly why they do it this way I’m not sure. Other than needing legacy adapters for PXE boot (which this isn’t doing) I can’t see why we’re using legacy adapters. I am assuming the visual studio team selected them for a good reason, probably around issuing commands to the VMs, but I don’t know why.

When the environment is started, Lab will assign static IP addresses to the NICs attached to the private network. All ours seem to be 192.168.23.x addresses. It will also set the DNS address to be that which has been assigned to the DC in the lab. The legacy adapters will be set to DHCP for all settings. The end result is a DC that is only connected to the private network and all other machines connected to both private and main networks.

Once the environment is up, Lab Manager should configure the test agent and you’re off. The new lab environment can then be stored in such a way as to allow multiple copies to be deployed as required by the devs.