On a recent TFS consultancy job, I was asked to monitor how long some builds spent waiting in the Build queue before Starting.
My plan was to use the TFS API to query all builds with a status of ‘Queued’ and monitor the wait times.
I wrote the code and everything seemed to work fine. However, after capturing a number of wait times and comparing them to the overall build times I noticed that the times did not match together.
In fact a build that was estimated to complete in 2 hours; took more than 6 hours and did not spend any time ‘Queued’
The Build Controller distributes builds across multiple agents and will start one build per build agent. (given you haven't changed the default MaxConcurrentBuilds setting ). i.e If you have 3 build agents and you start 3 builds then the controller will set 3 builds into ‘InProgress.’ If you start a 4th build the this build will be ‘Queued’
This works fine given that any build agent can run any build definition
Unfortunately it does not take into account ‘Tags’ that may force certain builds onto specific agents.
Given the same conditions of 3 build agents:-
If you tag a build agent so only certain builds can use it and then start 3 builds that should only run on this tagged build agent. –> well you would expect that only one build would be set ‘InProgress’ and the other 2 builds would remain ‘Queued’ until the build agent finished the 1st build.
However the actual behaviour is that all 3 builds change to ‘InProgress’ at the same time; one per the MaxConcurrentBuilds setting on the build controller); but only the first build is actually doing anything. The second two builds are stuck waiting to be allocated an agent .
You look at your dashboard and see a list of builds ‘In Progress’ that are actually blocked waiting for a build agent.
On the above screen-shot, only 213 is actually running on Build Agent 1. (214 and 215 are blocked waiting for agent 1 to become available)
Worse than that is 217; that can run on any build agent; is blocked in a ‘queued’ state when there are 2 idle build agents that could be running this build. However, It cannot start because the MaxConcurrentBuilds value of 3 has been reach.
Be very careful with the use of Tags. In future I will try and avoid tags when it could introduce the above bottleneck.
Additionally when attempting to use the TFS Api to capture metrics on wait times then you cannot rely on Queued build only. Instead I’ll query all builds assigned to a controller; and then filter out the list of builds that have been assigned a build agent –> This will give me the accurate list of builds pending.