User profile service failures caused by Distributed Cache on SharePoint 2013
This is a relatively quick post, largely because I was not able to record the details of the error messages we saw in the farm before we fixed them.
On a recent engagement we were investigating problems with a SharePoint 2013 farm that had been installed for our customer by a third party. There were a number of issues that we worked through but I wanted to record this one for the community.
The user profile service in the farm was not responding. Attempts to manage the service resulted in an error page. In addition, the user profile synchronisation service was stuck in the starting state. Documentation suggested that both had been configured during installation and checked to be working.
We ran through a number of checks to no avail. Then I started looking at how the farm topology was configured.
The distributed cache service had been stopped on the two web front end servers and the two app tier servers in the farm. Two additional SharePoint servers were in the farm, but had no services at all on them, other than the distributed cache. As an experiment we started the cache service on the other servers and the user profile service sprang into life. The synchronisation service also started without error.
I haven’t had chance to attempt to recreate this issue and check whether there was a server-to-server communication issue or whether the distributed cache service actually needs to be running on the servers that host the user profile service and user profile synchronisation service, but if you have strange service issues with 2013, check your cache topology!