Deploying the ASDK for effective development use

Microsoft Azure Stack is a truly unique beast in terms of the capabilities it can bring to an organisation, and the efficiencies it can bring to a project that spans Public Cloud and on-premises infrastructure through it’s consistency with public Azure.

We’ve been using Stack in anger for a customer project for a year now and have learned several things about development and testing, and how to configure the ASDK to be an effective tool to support the project. This post will summarise those learnings and how I deploy the ASDK so you may mirror our approach for your own projects.

Learning 1: Cloud-consistent, but not quite

Azure Stack is great in that you use the exact same technologies to develop and deploy as its big brother. However, there are some nuances that we’ve hit. Stack lags behind Azure in API versions and provides a subset of Resource Providers. Whilst it is possible to configure your Azure subscription to the same API versions as Stack using policies, that’s not a guarantee of compatibility. In our project a partner org has been using Azure with those policies applied, but they still managed to update the language-specific SDKs for Storage beyond those supported by Stack. You need to be very watchful about version support, and our experience is that there is no replacement for using Stack as part of your pre-production testing effort.

Learning 2: Performance may vary

This one is obvious, if you think about it. Stack is built on different infrastructure than Public Azure because we don’t get things like FPGA-accelerated networking, for example. That means you need to performance test your stuff on Stack or you may be upset that it doesn’t meet the stats you got from Azure. Don’t get me wrong – we’ve put some very big load on both single-box ASDK and multi-node MAS during our project, but stuff like VM provisioning times, network performance and storage read/write times are different than production Azure and if your solution relies on this stuff you need to load test on your target environment.

Learning 3: Identity, identity, identity

Stack can be configured in connected or disconnected modes and that is a big, fundamental different you need to be aware of. In connected mode you use Azure AD for a single source of identity across your Stack and Azure subscriptions. With disconnected mode you are using ADFS to connect Stack to an Active Directory. The key difference is if you are using Service Principals – in disconnected mode you can only do certificate authentication.  This is supported on Azure but not the default. Our approach has been (since our project is disconnected) to do all our dev in an ASDK for absolute confidence of compatibility but with careful governance you could use public Azure. The key difference is that with the ASDK you create a service principal using PowerShell and the certificate gets created at that point for you to extract. With public Azure you need to create the cert first and provide it when creating and service principal.

Learning 4: ‘D’ is for Demo, not Dev; at least out-of-the-box

The ASDK is a wonderful thing. It’s easy to download and deploy a single-node version of Stack for testing. However, the installer delivers the system in a bubble. To use it you have to RDP into the host to access the portal and mess around with VPNs. Connecting to your CI/CD tool of choice (we use VSTS) is hard verging on impossible and at the very least active use for development is impractical.

You also have to content with the fact that the ASDK is built on evaluation media, and that you can’t upgrade it from version to version. I have no argument as to why this is – you shouldn’t be able to use the dev kit as a cheap way of getting Stack into your prod environment, but it means that if you want to use the ASDK for development you need to roll your sleeves up a bit… More on that in a moment.

Learning 5: Matt McSpirit Rocks!

For a long time I kept thinking that I should collect the ragtag bunch of PowerShell I used to deploy our ASDK each month into something resembling a usable solution and publish it. Then Matt McSpirit released ConfigASDK and I no longer needed to! If you use ASDK at all, then you need Matt’s Script because it makes post-install configuration an absolute breeze!

Effective Deployment 1: Hacking the Installer

We now have more than one ASDK and I need to rebuild each one pretty much monthly. That means streamlining the process. To do that, I convert the downloaded VHD to a WIM and we import it into our Config Manager deployment so I can use PXE to easily reimage the servers. Once that’s done I can clear down the storage spaces left over from the previous install and rebuild. I need the ASDKs to be on our network, so I assign a /16 network range for each Stack as well as assigning an IP on our main subnet to the host. I also modify the deployment scripts so I can use a custom region and external DNS suffix which allows us to access the ASDKs over DirectAccess – handy for when we’re working on site with our customer.

To do all this I loosely follow the documentation to deploy using PowerShell.

Reconfigure Network Ranges

The networking for the ASDK is all configured from the OneNodeCustomerConfigTemplate.xml file that is in c:CloudDeploymentConfiguration. I simply open that file and replace all occurances of 192.168. with my own /16 IPv4 range (e.g. 10.1.). That will give you an ASDK with several address ranges and importantly, a full /24 block of IP addresses for services (storage endpoints, Public IP Addresses, etc) that can sit on your network.

You’ll need to configure your own network to publish the route into that /16 address range, with an IP address on your own network the gateway address – we’ll assign this to the VM in stack that does the networking during the installation.

Reconfigure Region and ExternalDomainSuffix

When you deploy AzureStack it will create an AD called Azurestack.local that the infrastructure VMs will join, including the host. You will also find that the portal, admin portal and services are published on a DNS domain of local.azurestack.external. In that domain, ‘local’ is the region and ‘azurestack.external’ is the external domain suffix. Understandably, for a dev/demo deployment these are not exposed to you as things you can change. However, changing them is actually pretty easy when you know where to look.

When you run the InstallAzureStackPOC.ps1 command, it actually calls the DeploySingleNode.ps1 script that is in c:clouddeploymentsetup and if you open that script you’ll find that there are parameters called RegionName and ExternalDomainSuffix and both these params have default values. Simply open that file in an editor and change them. Line 143 contains the $RegionName parameter with a default of ‘local’ so I change that to be the name of my ASDK host; line 130 contains the $ExternalDomainSuffix parameter with a default of ‘azurestack.local’ so I change that to stack.<my internal domain suffix>.

This approach means that I end up with all the ASDKs within stack.mydomain.local, with the name of the host being the region – that’s a good logical approach as far as I and my team are concerned!

Now, doing that isn’t quite all you need to do. Just like with the network routing requirement, you have to get the DNS requests to the Azure Stack DNS or nothing will work. If you follow my approach for the network address range, then the next step is to create some delegated subdomains in your internal DNS. We are using Windows Server and our internal DNS is linked to our AD, so the steps are as follows:

  1. Open the DNS console on your DC.
  2. Within your internal domain, create a HOST record for the stack DNS. I normally call mine <host>stackdns.<mydomain> because I have more than one. Assuming you’re done the neworking changes like me, the IP address for that is <your /16 subnet>.200.67 (e.g. 10.1.200.67).
  3. Within your internal domain create a subdomain of your internal domain called ‘stack’.
  4. Within that stack subdomain, right-click and select New Delegation. In the dialog that pops up enter the name you specified as the RegionName and then when asked for a DNS FQDN enter the hostname you just registered.
  5. If you are using App Services you also need to create a delegation for their domain – <region>.cloudapp.<external domain suffix>. Simply create another subdomain within stack called cloudapp and then repeat the previous step to register the <region> delegation.

Once you’ve configured your internal network and DNS, then modified the config file and script you can perform the installation. When doing so, make sure that in the parameters of the InstallAzureStackPOC.ps1 you specify the NatIPv4Address parameter as the IP address you set in your routing configuration to be the IP address you used in your network router configuration.

Effective Deployment 2: Removing the NAT

This bit hasn’t changed since the preview releases and I follow the notes on the AzureStack blog post. Once the ASDK install completes I simply run a short script that contains the powershell at the end of that post to remove the NAT, at which point that VM sits on my internal network with the IP address I set in the deployment and have configured in my router and traffic just flows.

Effective Deployment 3: ConfigASDK

Matt’s ConfigASDK script is excellent, an I use this to configure our ASDKs post-install. The script will deploy additional resource providers for you (SQL, MySQL and AppServices), install handy utils on the host and pull down a bunch of marketplace offerings you’ll find helpful. For our project we are not using the SQL and MySQL RPs so I skip those, along with the host customisation part.

Now, Matt’s script is built with the Stack team and as such doesn’t currently support the hacks for region and external DNS. I have a fork of the repo in my GitHub where I have added the RegionName and ExternalDomainSuffix params to Matt’s script, but that may not be up to date so you should look at what I’ve done and proceed with caution. I had to put a bunch of code in to replace what were understandably-fixed URLs. However, once I’d done that all Matt’s code worked as expected.

Usual disclaimers apply – use my version of Matt’s script at your own risk. make sure you understand the implications of what I’m describing here before you do this on your own network!

Post-Deployment Configuration Tips

Once you’ve got your ASDK up and running there are still a couple of things you should tweak for a happier time, particularly if you batter the thing as much as we do in terms of creation and deletion of resources and load testing.

Xrp VM RAM

Something you can do with ASDK but not with MAS is get at the VMs that form the fabric of Stack. In an ASDK these are not resilient and are smaller in config than their production counterparts. Our experience is that one of them does not have enough RAM configured by default. The AzS-Xrp01 VM is create with 8192MB of RAM. We see it’s demand climb well beyond this in use and as it does, you may see storage failure messages when trying to create accounts, access the storage APIS or blades in the portal. I give it another 4GB – increasing it to 12288MB in total and that seems to sort it. You don’t need to turn off the VM to do this – just go into Hyper-V Manager and increase the amount.

App Services ScaleSets

As of 1808 you can do this in the portal rather than through PowerShell. When App Services is installed there are a number of VM scalesets created that contain the VMs that host the App Hosting Plans. The thing is, the App Services team don’t know what you need, so the deploy with a single Shared worker tier VM and none in the other tiers. You need to go into the portal and change the instance count to an appropriate number for your project. I normally scale back Shared to zero and set the Small scaleset to a reasonable number based on the project. I also set the Management scaleset to to two instances. I’ve found that makes for a more reliable system, again, because we tend to batter our ASDK during development.

Parting thoughts

Please remember that the approach I describe here is not the way the Stack team expect you to deploy an ASDK. You need to have a reasonable understanding of networking and DNS to know the implications of what I change in my approach to deployment. Also bear in mind that Stack releases monthly updates so if you’re reading this in a year’s time, stuff might have changed so be careful.

Azure Stack is a very powerful solution for a very specific set of customer requirements – it’s not designed to be something everybody should use and you need to remember that. If you do have requirements that fit, hopefully this post will help you configure your ASDK to be a useful component in your development approach.

Postmortem published by the Microsoft VSTS Team on last week’s Azure outage

The Azure DevOps (VSTS) team have published the promised postmortem on the outage on the 4th of September.

It gives good detail on what actually happened to the South Central Azure Datacenter and how it effected VSTS (as it was then called).

More interestingly it provides a discussion of mitigations they plan to put in place to stop a single datacentre failure having such a serious effect in the future.

Great openness as always from the team

VSTS becomes Azure DevOps

Today Microsoft made a big announcement, VSTS is now Azure DevOps.

The big change is they have split VSTS into 5 services you can use together or independently, including Azure Pipelines for CI/CD – free for open source and available in the GitHub CI marketplace.

An important thing to note is that IT IS NOT JUST FOR AZURE.

Don’t be afraid of the name. There a wide range of connectors to other cloud providers such as AWS and Google Cloud, as will as many other DevOps tools

Learn more at have a look at the official post

Videos do not play in VSTS WIKI via relative links – workaround

The Problem

The documentation for the VSTS WIKI suggests you can embed a video in a VSTS WIKI using the markdown/HTML

<video src="_media/vstswiki_mid.mp4" width=400 controls>
</video>

Problem is that this does not seem to work, the MP4 just does not appear, you get an empty video player.

However, if you swap to a full URL it does work e.g.

<video src="https://sec.ch9.ms/ch9/7247/7c8ddc1a-348b-4ba9-ab61-51fded6e7247/vstswiki_high.mp4" width=400 controls> 
</video>

This is a problem if you wish to store media locally in your WIKI

The Workaround

The workaround is to either place the MP4 file in some URL accessible location e.g. some Azure web space (not really addressing the problem), or more usefully use the VSTS API to get the file out the repo that backs the WIKI.

The format of the HTML tag becomes

<video src="https://vstsinstance.visualstudio.com/MyTeamProject/_apis/git/repositories/MyTeamProject.wiki/Items?path=_media%2Fvstswiki_high.mp4 width=400" controls>
</video>

This will get the current version of the file on default branch, you can add extra parameters to to specify versions and branches if required as per the API documentation.

So not a perfect solution as you have to think about branches and versions, they are not handled automatically, but at least it does work

Configure Server 2016 ADFS and WAP with custom ports using Powershell

A pull request for Chris Gardner’s WebApplicationProxyDSC is now inbound after a frustrating week of trying to automate the configuration of ADFS and WAP on a Server 2016 lab.

With Server 2016, the PowerShell commands to configure the ADFS and WAP servers include switches to specify a non-default port. I need to do this because the servers are behind a NetNat on a server hosting several labs, so port 443 is not available to me and I must use a different port.

This should be simple: Specify the SSLPort switch on the Install-ADFSFarm command and the HttpsPort on the Install-WebApplicationProxy command. However, when I do that, the WAP configuration fails with an error that it cannot read the FederationMetadata from the proxy.

I tried all manner of things to diagnose why this was failing and in the end, the fix is a crazy hack that should not work!

The proxy installation, despite accepting the custom port parameter, does not build the URLs correctly for the ADFS service, so is still trying to call port 443. You can set these URLs on a configured WAP service using the Set-WebApplicationProxyConfiguration command. However, when you run this command with no configured proxy, it fails.

Or so you think…

On the ADFS Server:

  1. Install-AdfsFarm specifiying the CertificateThumbprint, Credential, FederationServiceDisplayName, FederationServiceName and SSLPort params

On the WAP Server:

  1. Install-WebApplicationProxy specifiying the HttpsPort switch,  CertificateThumbprint, FederationServiceName and FederationServiceTrustCredential params.
  2. Set-WebApplicationProxyConfiguration specifying the ADFSUrl, OAuthAuthenticationURL and ADFSSignOutURL parameters with the correct URLs for your ADFS server (which include the port in the Url).
  3. Re-run the command in step 1.

Despite the fact that step 2 says it failed, it seems to set enough information for step 3 to succeed. My experience, however, is that only doing steps 2 and 3 does not work. Weird!

As a side note, testing this lot is a lot easier if you remember that the idpinitiatedsignon.aspx page we all normally use for testing ADFS is disabled by default in Server 2016. Turn it on with Set-AdfsProperties -EnableIdPInitiatedSignonPage $true

Registering an agent with VSTS and getting the message "Agent pool not found"

When you want to register a build agent with VSTS, you use the VSTS instance’s URL and a user’s Personal Access Token (PAT). Whilst doing this today I connected to the VSTS instance OK but got the error “Agent pool not found”.when I was asked to pick the agent pool to add the new agent to.

As the user who’s PAT I was using was a Build Administrator I was a bit confused, but then I remembered to check their user access level. It was set to Stakeholder, once this was changed to Basic I was able to register the agent without use.

Also, so as to not use up a Basic license, when I did not need to, I swapped the user back to being a Stakeholder once the agent was registered. This can be done as the token used for the actual build is not the one used to register it, but one assigned at build time by VSTS.

Experiences migrating TFS XML Team Project Templates to Inherited Team Project Templates

You have always been able to customise your Team Projects in TFS, by editing a host of XML files, but it was not a pleasant experience. In VSTS a far more pleasant to use web based inherited customisation model was added, much to, I think, most administrators relief.

If you used the TFS DB migration service you ended up with a VSTS instance full of the the XML style team projects, and you were stuck there, with no way to change these to the new inherited mode, that is until now as Microsoft have released to preview a conversion tool.

I have been trying this tool to migrate all our active XML based team projects to Inherited equivalents, and it has worked very well for me, but I have to say, we don’t have any hugely complex customisations, so your mileage may vary depending on how much you modified your XML based team project templates.

However, I wanted to go further than the basic process as documented

Since moving to VSTS all our new team project have been created  using an inherited template based on Scrum. I wanted to move all our active XML based team project to this standardised template. This took a little extra work.

The basic process is easy as to change Inherited Process you

  1. In the instance admin page https://[instance].visualstudio.com/_settings/process pick the target template
  2. Click on the ellipse (…) and pick ‘Change team project to use ….’
  3. Pick the team project you wish to migrate and you are done

REMEMBER: A really nice touch (with XML and Inherited templates) is that you can just switch back if you don’t like the result, no data is lost when you swap templates, but some of it might be hidden as fields are not shown on the new work item types.

The problem I had was that our old XML based team project templates had different customisation to our current inherited standard. To address this I needed to

  • Adding a few fields to our standard inherited template based on Scrum, for critical legacy customisations
  • In one case a work item types in use in the XML template had no match in our current template. In this case I changed the work item type to it’s equivalent (we had used a variety of ‘flavours of PBI’ so this was not a major problem), adding a tag to the work items to they could be identified, then deleting the offending work item type.
  • I could conceive of a need for more complex remapping for work item types and fields, but I was able to avoid this.

So once done, I was then able to migrate all the team project to my target process template.

So a very nice experience, and one that now means we can make sure all our team projects use the same set of customisation. No longer do we need to worry about customising in XML and Inherited models.

Hello sign in does not work on my first generation Surface Book after a rebuild – fixed

I have just rebuilt my first generation Surface Book from our company standard Windows image. These images are used by all our staff all the time without issue (on Surface and Lenovo devices), so I was not expecting any problems.

I used to rebuild my PC every 6 months or so, but got out the habit when I moved to a model I could not swap the hard drive out of as backup during the process (using a new disk for new install). I got around this by using Disk2VHD, not quite as good as I can’t just swap the disk back in, but I won’t have lost any stray files, even though I always aim to keep data in OneDrive, Source Control or SharePoint, so it should not been an issue anyway.

Anyway, the rebuild went fine, no issues until I tried to enable Hello to login using the camera. The process seemed to start OK, the wizard ran, but after a reboot there was no Hello login option.

After a bit of digging I found that in device manager there were no imaging devices at all – strange as the Camera App and Skype worked OK.

The Internet proved little help, suggesting the usual set of ‘you have a virus install our tool’ but after much more digging around I found that the cameras were all under ‘System Devices’ in device manager. So I then…

  1. Uninstalled them all (front, front IR and back)
  2. Scanned for hardware changes (they reappeared, still in ‘System Devices’
  3. Ran a Windows Update and some new Intel Drivers were downloaded
  4. I could then run the Hello setup wizard again, seems all old settings were lost

That was all much more complex than hoped for. My guess is that the system rebuild changed some firmware in a strange way that makes for the misdetection of the cameras.

Anyway it is working now.

Windows 10: We can’t add this account

A colleague recently started seeing an issue when attempting to add their work account to a Windows 10 device. Following a device re-image (this, as we’ll see becomes important…), a colleague saw the following error reported when attempting to add their work account:

Windows 10 we can't add this account

The full text of the error reads

We can’t add this account. Your organisation’s IT department has a policy that prevents us from adding this work or school account to Windows.

Initially we looked at whether recent policy changes had in fact impacted the ability to add a work account to a Windows 10 device, but were not seeing anything that appeared to impact this. We had other users who were receiving new PCs that were unaffected and had the same policies applied to them. In addition, nothing was showing up in the event viewer folder for Workplace Join on their machine when attempting to add the account.

Realising that the machine had just been reimaged, we checked in Azure AD to view the list of devices. The following is a screen shot for a different device ID, however this was similar to what we saw:

Device list in Azure AD

As can be seen, there are multiple instances of the ‘same’ machine. In each case, the machine has been reimaged and then had the work account added. In each case, Azure AD has obviously assigned a new device ID, hence what appears to be multiple copies of the same machine registered.

Once we’d deleted a few of the ‘old’ machines from the list, the user was able to successfully add their work account to the device.

There are a couple of potential solutions in our scenario:

  1. Periodically check the number of devices registered and trim as appropriate.
  2. Raise the limit of the number of devices that can be registered, either to a larger number, or to ‘unlimited’ in the device settings area of Azure AD.

Azure AD Device Settings