Caching NVD Vulnerability Dependency data on hosted Azure DevOps Pipeline agents

Background

On some projects we use Jeremy Long's DependencyCheck tool, via the Azure DevOps task, to scan our code for known vulnerabilities. This tool uses the National Vulnerability Database (NVD) to get its data. This data is downloaded on demand from the NVD site but the DependencyCheck tool.

Since the recent API changes on the NVD site, as supported by DependencyCheck 9.0.x, the downloading of the current vulnerability data has slowed from about 3 minutes to around 15 minutes, even with a valid NVD API Key. So effectively slowing all our pipeline builds by 15 minutes, a very significant change if the rest of the build only takes a few seconds!

Solution for on premise agents

For our on premise pipeline agents, this is not as bigger problem as it initially sounds. Each agent caches the NVD data locally, so once the first build has downloaded the data, subsequent builds use the local cache, updating it as needed.

To further reduce the impact to developers of getting the updated NVD data, we have a scheduled build running on each on premises agent to make sure we update the cache at least once a day. I previously posted on how to create your own Azure DevOps maintenance jobs for just this type of requirement.

But what about hosted agents?

Unfortunately we cannot use the same approach for hosted agents. The problem is that after each pipeline run the hosted agent is destroyed, so the cache is lost. This means that each pipeline run has to download the NVD data.

But there is a solution, to use the Cache Pipeline task. This task allows any user defined agent data to be cached between pipeline runs, even on hosted agents. The limitations of the cache are that:

  • The cache is specific to a pipeline definition, so there is no sharing of the cache between pipeline definitions
  • And that the cache only lasts 7 days.

But even with these limitations it is still a big improvement over downloading the data on each and every build. For a given pipeline definition that is run regularly i.e. a multiple times a week, the cache will be used for all but the first run.

The YAML to setup the cache is as follows. Note the key here is to find the location of the NVD cache using some PowerShell as we don't know the exact path as it depends on the version of the DependencyCheck task being used.

 1steps:
 2# find the current location of the NVD cache (it is task version specific)
 3-  powershell: |
 4     $nvdcachepath = $(get-childitem "$(Agent.WorkFolder)\_tasks\dependency-check-build-task*\*.*.*\dependency-check\data").FullName
 5     echo "##vso[task.setvariable variable=nvdcachepath;]$nvdcachepath"     
 6   displayName: Find the NVD Cache path
 7
 8# create the cache
 9-  task: Cache@2
10   inputs:
11      key: '"NVDCache" | "$(Agent.OS)"'
12      restoreKeys: |
13         NVDCache | "$(Agent.OS)"
14         NVDCache         
15      path: $(nvdcachepath)
16   displayName: NVD Cache
17
18# No changes required from the standard task
19-  task: dependency-check-build-task@6
20   displayName: "Vunerability Scan Exploited Vulnerabilities update check"
21   inputs:
22      projectName: 'Maintainance'
23      scanPath: '.'
24      format: 'HTML'
25      additionalArguments: '--nvdApiKey $(nvdapikey)' 
26
27# a special end of run task is automatically added at runtime to save the cache

So now we have a solution that works for both on premise and hosted agents. Hopefully saving 15 minutes on all but your first pipeline runs of a given pipeline definition.