Complex Azure Template Odyssey Part One: The Environment

Part Two | Part Three | Part Four Over the past month or two I’ve been creating an Azure Resource Template to deploy and environment which, previously, we’d created old-style PowerShell scripts to deploy. In theory, the Resource Template approach would make the deployment quicker, easier to trigger from tooling like Release Manager and make the code easier to read. The aim is to deploy a number of servers that will host an application we are developing. This will allow us to easily provision test or demo environments into Azure making as much use of automation as possible. The application itself has a set of system requirements that means I have a good number of tasks to work through:

  1. We need our servers to be domain joined so we can manage security, service accounts etc.
  2. The application uses ADFS for authentication. You don’t just expose ADFS to the internet, so that means we need a Web Application Proxy (WAP) server too.
  3. ADFS, WAP and our application need to use secure connections. We want to be able to deploy lots of these, so things like hostnames and FQDNs for services need to be flexible. That means using our own Certificate Services which we need to deploy.
  4. We need a SQL server for our application’s data. We’ll need some additional drives on this to store data and we need to make sure our service accounts have appropriate access.
  5. Our application is hosted in IIS, so we need a web server as well.
  6. Only servers that host internet-accessible services will get public IP addresses.

We already had scripts to do this the old way. I planned to reuse some of that code, and follow the decisions we made around the environment:

  • All VMs would use standard naming, with an environment-specific prefix. The same prefix would be used for other resources. For example, a prefix of env1 means the storage account is env1storage, the network is env1vnet, the Domain Controller VM is env1dc, etc. The AD domain we created would use the prefix is it’s named (so env1.local).
  • All public IPs would use the Azure-assigned DNS name for our services – no corporate DNS. The prefix would be used in conjunction with role when specifying the name for the cloud service.
  • DSC would be used wherever possible. After that, custom PowerShell scripts would be used. The aim was to configure each machine individually and not use remote PowerShell between servers unless absolutely necessary.

We’d also hit a few problems when creating the old approach, so I hoped to reuse the same solutions:

  • There is very little PowerShell to manage certificate services and certificates. There is an incredibly useful set of modules known as PSPKI which we utilise to create certificate templates and cert requests. This would need to be used in conjunction with our own custom scripts, so it had to be deployed to the VMs somehow.

Azure Resources In The Deployment

Things have actually moved on in terms of the servers I am now deploying (only to get more complex!) but it’s easier to detail the environment as originally planned and successfully deployed.

  • Storage Account. Needed for the hard drives of the multiple virtual machines.
  • Virtual Network. A single subnet for all VMs.
  • Domain Controller
    • NetworkInterface. Will attach to the virtual network.
    • Virtual Machine. Requires an additional virtual hard disk to store domain databases.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. Will do as much configuration as we can, declaratively. It will add the ADDS and ADCS roles and create the domain.
      • CustomScriptExtension. Will deploy PowerShell scripts to perform additional configuration. It will create certificate templates and generate certs for services.
  • ADFS Server
    • NetworkInterface. Will attach to the virtual network.
    • Virtual Machine.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. Add the ADFS role and domain-join the VM.
      • CustomScriptExtension. Will configure ADFS – copying the cert from the DC and creating the federation service.
  • WAP Server
    • NetworkInterface. Will attach to the virtual network.
    • Public IP Address. Will publish the WAP service to the internet.
    • Load Balancer. Will connect the server to the public IP address, allowing us to add other servers to the service later if needed.
    • Virtual Machine.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. Will do as much configuration as we can, declaratively.
      • CustomScriptExtension. Copy the cert from the DC and configure WAP to publish the federation service hosted on the ADFS server.
  • SQL Server
    • NetworkInterface. Will attach to the virtual network.
    • Virtual Machine. Requires an additional virtual two hard disks to store DBs and logs.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. It turns out the DSC for SQL is pretty good. We can do lots of configuration with it, to the extent of not needing the custom script extension.
  • Web Server
    • NetworkInterface. Will attach to the virtual network.
    • Virtual Machine.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. Will do as much configuration as we can, declaratively.
      • CustomScriptExtension. Will deploy PowerShell scripts to perform additional configuration.
  • WAP Server 2
    • NetworkInterface. Will attach to the virtual network.
    • Public IP Address. Will publish the web server-hosted services to the internet.
    • Load Balancer. Will connect the server to the public IP address, allowing us to add other servers to the service later if needed.
    • Virtual Machine. Requires an additional virtual hard disk to store domain databases.
      • Diagnostics Extension. Will enable the diagnostics on the Azure VM so we can see things like CPU and disk access through the Azure Portal.
      • DSC Extension. Will do as much configuration as we can, declaratively.
      • CustomScriptExtension. Will deploy PowerShell scripts to perform additional configuration.

Environment-specific Settings

The nice thing about having a cookie-cutter environment is that there are very few things that will vary between deployments and that means very few parameters in our template. We will need to set the prefix for all our resource names, the location for our resources, and because we want to be flexible we will set the admin username and password.

An Immediate Issue: Network Configuration

Right from the gate we have a problem to solve. When you create a virtual network in Azure, it provides IP addresses to the VMs attached to it. As part of that, the DNS server address is given to the VMs. By default that is an Azure DNS service that allows VMs to resolve external domain names. Our environment will need the servers to be told the IP address of the domain controller as it will provide local DNS services essential to the working of the Active Directory Domain. In our old scripts we simple reconfigured the network to specify the DC’s address after we configured the DC. In the Resource Template world we can reconfigure the vNet by applying new settings to the resource from our template. However, once we have created the vNet in our template we can’t have another resource with the same name in the same template. The solution is to create another template with our new settings and to call that from our main template as a nested deployment. We can pass the IP address of the DC into that template as a parameter and we can make the nested deployment depend on the DC being deployed, which means it will happen after the DC has been promoted to be the domain controller.

Deployment Order

One of the nicest things about Resource Templates is that when you trigger a deployment, the Azure Resource Manager parses your template and tries to deploy the resources as efficiently as possible. If you need things to deploy in a sequence you need to specify dependencies in your resources, otherwise they will all deploy in parallel. In this environment, I need to deploy the storage account and virtual network before any of the VMs. They don’t depend on each other however, so can be pushed out first, in parallel. The DC gets deployed next. I need to fully configure this before any other VMs are created because they need to join our domain, and our network has to be reconfigured to hand out the IP address of the DC. Once the DC is done, the network gets reconfigured with a nested deployment. In theory, we should be able to deploy all our other VMs in parallel, providing we can apply our configuration in sequence which should be possible if we set the dependencies correctly for our extension resources (DSC and customScriptExtension). Configuration for VMs can be mostly in parallel except: The WAP server configuration depends on the ADFS server being fully configured.

Attempt One: Single Template

I spent a long time creating, testing and attempting to debug this as a single template (except for our nested deployment to reconfigure the vNet). Let me spare you the pain by listing the problems:

  • The template is huge: Many hundreds of lines. Apart from being hard to work with, that really slows down the Visual Studio tooling.
  • Right now a single template with lots and lots of resources seems unreliable. I could use an identical template for multiple deployments and I would get random failures deploying different VMs or get a successful deploy with no rhyme or reason to it.
  • Creating VM extension resources with complex dependencies seems to cause deployment failures. At first I used dependencies in the extensions for the VMs outside of the DC to define my deployment order. I realised after some pain that this was much more prone to failure than if I treated the whole VM as a block. I also discovered that placing the markup for the extensions within the resources block of the VM itself improved reliability.
  • A single deployment takes over an hour. That makes debugging individual parts difficult and time-consuming.

Attempt Two: Multiple Nested Deployments

I now have a rock-solid, reliable deployment. I’ve achieved this by moving each VM and it’s linked resources (NIC, Load Balancer, Public IP) into separate templates. I have a master template that calls the ‘children’ with dependencies limited to one or more of the other nested deployments. The storage account and initial vNet deploy are part of the master template. The upside of this has been manifold: Each template is shorter and simpler with far fewer variables  now each only deploys a single VM. I can also choose to deploy a single ‘child’ if I want to, within an already deployed environment. This allows me to test and debug more quickly and easily.

Problem One: CustomScriptExtension

When I started this journey there was little documentation around and Resource Manager was in preview. I really struggled to get the CustomScriptExtension for VMs working.All I had to work with were examples using PowerShell to add VM extensions and they were just plain wrong for the Resource Template Approach. Leaning on the Linux equivalent and a lot of testing and poking got things sorted and I’ve written up how the extension currently works.

Problem Two: IaaSDiagnostics

Right now, this one isn’t fixed. I am correctly deploying the IaaSDiagnostics extension into the VMs, and it appears to be correctly configured and working properly. However, the VM blades in the Azure Portal are adamant that diagnostics are not configured. This looks like a bug in the Portal and I’m hoping it will be resolved by the team soon.

Configuring the Virtual Machines

That’s about it for talking about the environment as a whole. I’m going to write up some of the individual servers separately as there were multiple hurdles to jump in configuring them. Stay tuned.