Fix for metadata generation failed exit code: 2147450750 loading wrong version of DLLs when building Azure Functions

The Problem

Recently an Azure DevOps Pipeline for a .NET 6 based Azure Functions started to fail on some of our self-hosted build agents with the error

1##[error]C:\Users\Administrator\.nuget\packages\microsoft.azure.webjobs.script.extensionsmetadatagenerator\4.0.1\build\Microsoft.Azure.WebJobs.Script.ExtensionsMetadataGenerator.targets(37,5): Error : Metadata generation failed. Exit code: '-2147450750' Error: 'Failed to load the dll from [C:\hostedtoolcache\windows\dotnet\shared\Microsoft.NETCore.App\3.1.32\hostpolicy.dll], HRESULT: 0x800700C1An error occurred while loading required library hostpolicy.dll from [C:\hostedtoolcache\windows\dotnet\shared\Microsoft.NETCore.App\3.1.32]'

The pipeline it self was simple, just repeating the steps a developer would use locally

 1  - task: UseDotNet@2
 2    displayName: "Use .NET 6"
 3    inputs:
 4        packageType: sdk
 5        version: 6.x
 6        performMultiLevelLookup: true
 7
 8  - task: DotNetCoreCLI@2
 9    displayName: "dotnet restore"
10    inputs:
11        command: restore
12        projects: "$(Build.SourcesDirectory)/src/Api.sln"
13        feedsToUse: "select"
14        vstsFeed: "aaa33827-92e2-45a0-924a-925b0d6344677" # organisation-level feed
15
16  - task: DotNetCoreCLI@2
17    displayName: ".NET Build"
18    inputs:
19        command: "build"
20        arguments: >
21        --configuration ${{ parameters.buildConfiguration }}
22        --no-restore
23        projects: "$(Build.SourcesDirectory)/src/Api.sln"        

The Cause

The issue was that the dotnet build was picking up a .NET 3.1 version of the hostpolicy.dll from the cache. This was even though the pipeline was set to use .NET 6, and I could see both .NET 3.1 and .NET 6 SDKs in the cache folder.

Note that the reason that there was also a .NET 3.1 version on the build agent was that the agent was a self-hosted agent that had been used for .NET 3.1 builds in the past.

This would not have occurred with a Microsoft hosted agent as they are rebuilt between each run. Even though we build our agent VM images using the same Packer process as used for the Microsoft hosted agent we do not rebuild between runs. So the cache can contain a variety of tools and SDKs from past runs. An advantage of this approach is that it can speed your build times as you don't have to download all the tools each time, but it can lead to issues like this.

The Solution

The solution was simple, I just deleted the cache on the build agent. This was done by deleting the contents of the folder C:\hostedtoolcache\windows\dotnet on all our build agents.

Once this was done the build worked as expected.

Given we don't create .NET 3.1 projects any more, deleting the old cache should be enough. However, I can see scenarios e.g building legacy projects, where a more complex solution to keep the cache 'cleaner' might be needed, but I will leave that for another day.

Update: 17 Aug 2023 was that other day...

We automated this process with this YAML fragment at the start of the job before any version of .NET is installed

1  - powershell: |
2      # Delete the contents of the hostedtoolcache\windows\dotnet search
3      $folder = test-path -path "c:\hostedtoolcache\windows\dotnet"
4      if ($folder) {
5      write-host "Deleting c:\hostedtoolcache\windows\dotnet"
6      Remove-Item -Path "c:\hostedtoolcache\windows\dotnet\*" -Recurse -Force -ErrorAction Ignore
7      }