SharePoint Website Schematic

I find myself drawing the same diagram over and over again in meetings to explain how SharePoint sites relate to IIS web sites, how managed paths and alternate access mappings fit and why you need to extend the SharePoint web application if you want more than one authentication provider.

After some of my colleagues pestered me to draw it again, I decided to create an electronic version, and since everybody seems to find it so useful I thought I’d post it here as well.

Rather than talk about it, I’m going to post it ‘blind’ and invite comments from you, my enthusiastic audience as to how easy it is to understand and any errors or omissions there may be.

With a little luck, somebody will find it useful!

SharePoint Website Architecture

Site Policies and FBA in SharePoint: Update

My apologies to Craig, who posted a comment to my earlier post about our FBA problems and I didn’t notice until today.

To update you all on the situation, the fault is still with Microsoft and I have not yet received a hotfix.

However, for anybody considering FBA in their deployment, I would not let this issue stop you. There are two reasons I say that:

  1. Normally with FBA you would extend the web application in question, having both FBA and Windows authentication available on the same content via different URLs. This makes your life easier with things like indexing and management.
  2. The workaround I detailed is a good temporary solution to the problem with only minimal impact on the user experience (in that certain options are offered which may not work too well in Office when using FBA).

Hopefully this answer’s Craig’s question and assuages any doubts about the wisdom of deploying FBA in your SharePoint solution.

Workflow and SQL Error: Update

I posted last week about a couple of issues we were experiencing with SharePoint. I made some traction on the Workflow History issue at the end of last week and the revelation was pretty far-reaching, so I’m posting again.

It turns out that the stuff I said about systemupdate was wrong… up to a point.

There is a bug with systemupdate and triggering events, but it’s not the one we thought it was! It turns out that the behaviour we are seeing is correct – systemupdate is supposed to trigger events, just not update things like the modified by and last updated columns. It’s actually the behaviour within a workflow which is at fault, in that events aren’t being triggered when they should be.

I had a chat with our developers about this and they told me that there are plenty of articles on the web suggesting that systemupdate is the way to update an item in a list without triggering events. Don’t do it! I was told by Microsoft that whilst the fault is not high on the list because there is a workaround (which I will list in a moment), it will be fixed. At that point, anybody who is using systemupdate expecting events not to fire will get a shock.

The MSDN documentation for system update is pretty clear:

When you implement the SystemUpdate method, events are triggered and the modifications are reported in the Change and Audit logs, but alerts are not sent and properties are not demoted into documents.

The explanation as to why events don’t fire is:

When you used in other places such as windows/console app, another workflow or webparts, you are not seeing the event trigger the workflow, this is due to the Workflow runs on separate threads from the main thread, so we cannot fire up the workflow and simply quit. Quitting an app before the async worker threads are finished causes those threads to simply abort, and in the case of workflow, nothing will appear to have happened.

And the fix:

Currently, all standalone applications must call spsite.workflowmanager.Dispose(). This call waits for the threads to complete and causes workflow to go into an orderly shutdown.

And the solution to the problem of wanting to not trigger events? Well, it looks like the method I described in my earlier post is the way to go.

Workflow History and SQL Error

When trying to view an item in a list which has workflows run against it, you get an error:

Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries

Problem Background

Trying to explain the exact nature of our configuration in this case would break many people’s heads. This, therefore, is a bit of a simplification.

We have a custom webpart which allows users to log an enquiry. We create an item in a list with the enquiry details, and send an email to the account responsible for dealing with those enquiries. A copy of the list item is created in another list (we’ll leave out the why and wherefore of that for now). The two copies must be kept in sync. More details on that later.

Those enquiries must be closed within 30 minutes. If not, an escalation email is sent. An enquiry is closed if a particular column changes value. To ensure the two lists are kept in sync, when an item is changed a workflow is triggered. If the column we care about has changed we sync up the item in the other list.

The escalation process is a timer. It checks the items and sends emails. It updates a column with the time of the last email sent so we can repeat the process every 30 minutes.

What we found was that the enquiries weren’t being closed for a few days and in that time we could then not access the enquiry item at all via the web interface (although datagrid view still worked!). We saw the error at the top of this post.

The Root of the Matter

This fault is currently with our Microsoft Support team and they are working through it. I do, however, have enough knowledge and understanding of why the fault occurred to explain it, and a few dirty hacks to avoid it.

The reason we can’t access the items is because when SharePoint pulls up the item for edit/view it checks the Workflow History for that item. If there are more than about 200 entries for that item in the Workflow History list, we get the SQL query error and boom! That’s the long and short of it.

The deeper question is why? More importantly, why do we have over 200 workflows running on the item?

Workflow History first. The Workflow History list is a hidden list which does exacly what it says on the tin. Items are created each time a workflow runs. It turns out that items in the Workflow History list have a time-to-live and that time is 60 days. That means that any item in the list will automatically be deleted after 60 days. With roughly a 200 item limit before you hit trouble that means about 3 workflows per list item per day is your maximum.

Personally, I think that is a scalability issue. I can envision a scenario where we might want to run that many workflows by design, perhaps more.

Back to the plot. I suspect you’re sitting there thinking that in our case, having that many workflows run is bad design or a fault. Well, you’re not wrong, although you’re not quite right either.

We knew when we built the workflow that we had to avoid circular references and update the lists as little as possible. There is code to make sure that changes made by the workflow itself are not reflected back, and if the change is not the column we care about then the workflow exists cleanly.

We also knew that because the timer job updates a different column in the list item, that would trigger the change event on the list item, running the workflow. As a result the timer performs a system update on the list item which should not trigger events (and indeed does not when we have used the method elsewhere).

What this means is that we actually have two problems:

  1. The system update method when used in our timer is not working correctly and events on the list item are being triggered. This means that the workflow is running too often.
  2. The issue with Workflow History means that we very quickly hit the 200 item limit and meet our end with the SQL query error.

A Legion of Dirty Hacks

As I write, these issues are with Microsoft Support who are ably working to resolve them. In the meantime, we have made the problem go away with two approaches, both of which I regard as dirty hacks.

The Workflow History Conundrum

Whilst investigating this problem I came across a discussion on the TechNet support forums. Ironically this was coming at the same problem but from a wholly opposite angle, whereby people wanted to keep items in the Workflow History list for longer!

What I found in that list was a post by Fred Morrison containing a PowerShell script. I am re-posting it here for completeness in case the forum disappears, but all credit to Fred for this – I didn’t write it!

   1: # SPAdjustAutoCleanupDays.ps1
   2: # Author: Fred Morrison, Senior Software Engineer, Exostar, LLC
   3: #
   4: # Purpose: Adjust SharePoint Workflow Association AutoCleanupDays value, where necessary
   5: # on all workflow associations for a specified List.
   6: #
   7: # Parameters:
   8: # siteName - The SharePoint Site to look at
   9: # listName - The SharePoint List to look at
  10: # newCleanupDays - The number of days to set the workflow association AutoCleanupDays value to, if not already set.
  11: #
  12: # Example call: SPAdjustAutoCleanupDays http://workflow2/FredsWfTestSite FredsNewTestList 180
  13: #
  14: # following makes it easier to work with SharePoint and also means you have to run this script on the SharePoint server
  15: [void] [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint") | Out-Null
  16: # capture command line arguments
  17: $siteName = $args[0] # ex: http://workflow2/FredsWfTestSite/
  18: $listName = $args[1] # ex: FredsNewTestList
  19: [int] $newCleanupDays = [System.Convert]::ToInt32($args[2]) # ex: 1096
  20: Write-Host $siteName
  21: Write-Host $listName
  22: Write-Host $newCleanupDays
  23: # get a reference to the SPSite object
  24: $wfSite = New-Object -TypeName Microsoft.SharePoint.SPSite $siteName
  25: [Microsoft.SharePoint.SPWeb] $wfWeb = $wfSite.OpenWeb()
  26: Write-Host $wfWeb.ToString()
  27: # get a reference to the SharePoint list we wish to examine
  28: [Microsoft.SharePoint.SPList] $wfList = $wfWeb.Lists[$listName];
  29: Write-Host $wfList.Title
  30: [Microsoft.SharePoint.Workflow.SPWorkflowAssociation] $wfAssociation = $null
  31: [Microsoft.SharePoint.Workflow.SPWorkflowAssociation] $a = $null
  32: [int] $assoCounter = 0
  33: [string] $message = ''
  34: # Look at every workflow association on the SPList and make sure the AutoCleanupDays value is correctly set to the desired value
  35: for( $i=0; $i -lt $wfList.WorkflowAssociations.Count; $i++)
  36: {
  37: $a = $wfList.WorkflowAssociations[$i]
  38: [string] $assocName = $a.Name
  39: Write-Host $a.Name
  40: if ( $a.AutoCleanupDays -ne $newCleanupDays )
  41: { 
  42: $oldValue = $a.AutoCleanupDays 
  43: $a.AutoCleanupDays = $newCleanupDays
  44: # save the changes 
  45: $wfList.UpdateWorkflowAssociation($a) 
  46: $message = "Workflow association $assocName AutoCleanupDays was changed from $oldValue to $newCleanupDays"
  47: }
  48: else
  49: {
  50: $message = "Workflow association $assocName AutoCleanupDays is already set to $newCleanupDays - no change needed"
  51: }
  52: Write-Host $message
  53: }
  54: Write-Host 'Done'
  55:  

I simply ran that script on our system, setting the value for newCleanUpDays to 1. I waited a day and voila! All the list items were now accessible. Note that, as repeated in the forum discussion time and again, messing about with this is not a good idea. I simply have no choice right now.

The Timer Incident

It was all very well fixing the Workflow History list, but we really shouldn’t be seeing all those workflows in the first place. For some reason, our method of updating the list item from the timer, whilst being the official approach, triggered the workflow anyway.

To the rescue, a method we found on the blog of Paul Kotlyar. In that post, Paul talks about disabling event firing for the list item to ensure that no events get triggered. Why do I think this is a hack? Because the functionality is not normally found in workflows and timers – the method is part of SPEventReceiverBase.

Where Do We Go From Here?

Right now, I have support cases logged with Microsoft and engineers are working on the matter. We’ve already been via the SQL team, who looked at the original query that triggered the whole shebang, and they have returned an updated query for the SharePoint guys to look at. We also need to get to the bottom of a ‘correct’ way of updating list items without triggering events. As soon as I get a resolution from Microsoft, I will let you know.

Problems with Site Policies and FBA in SharePoint 2007

If you are using Forms Based Authentication and try to access Site Policies you may well find that you get an Access Denied response. If you do, this post will help you!

I’ve been meaning to post this for a while because I’m sure it may help somebody. As usual, it’s been pushed back and back until now I finally have some time. I also have another, workflow-related post on another problem which will follow shortly.

Problem Background

We have been working on a SharePoint solution which involves a large number of web applications, each of which is hosting an internet-facing site using SharePoint Publishing. Each of these web applications uses forms-based authentication (FBA) with the AD provider in order to provide a better user experience.

Why a large number of web application? The format of the required urls, and the number of urls per site led us down that path.

Why only FBA and not extend the web app to have one with FBA and one with windows auth? There are enough web applications already, and since we would almost never use the windows auth ones why load up the system?

One of the roles of these site is to allow end users to upload files. However, our customer wants the document library that is used for upload to automatically purge files after a given time.

No problem, site polices will do that…

The Problem Itself

Unfortunately, when you authenticate as the site collection administrator, or a member of the site owners group, or a farm admin, you can’t access the Site Policies administration pages within the site collection hosted in the web application that uses FBA.

We spent a while checking group memberships and trying different users, and we established that toggling between FBA and windows auth would make the problem go away. We then called in the Microsoft Support guys and continued to investigate. Those guys deserve a mountain of praise as they carefully replicated the fault and kept us firmly in the loop whilst working on the call.

Using Gary Lapointe’s stsadm extensions we were able to examine the rights granted to the admin user when the site used FBA. The gl-enumeffectivebaseperms command showed interesting results.

stsadm -o gl-enumeffectivebaseperms -url <url>

FullMask

stsadm -o gl-enumeffectivebaseperms -invert -url <url>

UseClientIntegration, FullMask

The first command tells me what rights I have. The second tells me what rights I don’t. Note the confusion!

It turns out that when using FBA, the rights are not correctly applied for full control. The Microsoft team provided a workaround, in that enabling Client Integration fixes the application of rights. You either do this using stsadm to reset the authentication provider:

stsadm -o authentication -rul <url> -type forms -membershipprovider <membership provider> -rolemanager <role manager> -enableclientintegration

Or through the central admin. We have lots of sites, so we do it through the stsadm route.

The Long-Term Solution

I have been informed by Microsoft Support that this has now been logged as a bug with SharePoint and that they are working on a fix. As soon as I know more, I will post again.

A great article on handy SharePoint controls

I don’t know about you, but I always mean to gather various bits of knowledge into one place, but just like tidying my filing at home, I never quite get around to it. Fortunately for me, Chris O’Brien is a bit more organised and in my ever expanding blogroll today I saw a great article about really useful SharePoint controls to use in custom pages for that handy bit of functionality.

Project Partner Day

Well, it’s the end of day zero, the partner-only day here at the Madrid Project Conference. It’s been an interesting day. I’m not sure what I am allowed to say, but service pack 1 for Office 2007, which covers the desktop products, sharepoint, project server et al is very close to being available now. That was an interesting announcement, as we are looking at installing Project Server in Black Marble. I’d like to wait for SP1 – it makes sense – but because SharePoint will be patched at the same time I need to do some testing of our customisations first.

Meanwhile, outside the conference, we managed to leave the hotel for a few hours this morning before the partner event. The part of Madrid we are in has an incredible amount of building work underway; all all the roads are dual carriageways with big cloverleaf junctions. A fifteen minute taxi ride to the local Shopping Centre would probably have been a ten minute walk, had we realised where the shopping centre was in relation to the hotel. Ah well!

Tomorrow, the conference starts in earnest and I am planning to follow the system administration track, leaving the project management stuff to Paul and Jim. There are some interesting sessions ahead…

EMEA Project Conference – Madrid

Finally, after all the excitement that Richard and Robert had in Seattle and Barcelona, I find myself in the Auditorium Hotel, Madrid for the EMEA Project Conference.

According to the multilingual sales blurb in my room, the hotel is the largest in Europe, and I must say it’s very nice. We flew in yesterday and today is an MS Partner-only day before the conference itself kicks off tomorrow.

Project Server is something we’re very interested in using ourselves, and it’s integration with SharePoint (MOSS/WSS) makes it an attractive solution to anybody who has already deployed MOSS for their corporate intranet, as we have.

Also on the agenda today is VSTS integration with Project Server, which I’m keen to see more on. Closing the loop between developer activity and project planning and monitoring can make a big difference to whether a project comes in on time and budget.

I’m here for the SysAdmin track, whilst Paul and Jim cover the managerial and best practice side of things. I’ll do my best to blog on what I see, although it’s a pretty packed few days, ending in a good sprint from the end of the last session at 3:15 on Wednesday to make it to the airport in time for our 5:25 flight back to Blighty.

SharePoint problems with access rights

I spent a while knocking my head against a problem with a SharePoint server farm that’s worth posting about. It’s also worth a big hats-off to our Technical Support Coordinator at Microsoft Partner Support who dredged up the article that finally pointed us in the right direction.

The problem

I’ll post later about our approach to SharePoint installations, but I’ll summarise thus: We create multiple user accounts to the SharePoint services – a db access account, an installation account etc etc. In this instance we were building a three server setup – db server, web server, index server. The accounts were created first, logged in as a domain admin. I also installed SharePoint as the domain admin, but didn’t run the config wizard.

I then logged in as our installation user, which has local admin rights to the two servers and dbcreator and securityadmin roles in the SQL server. I ran the config wizard on the web server and created the farm, specifying the db access account for (shock!) db access! The web server got to host the central admin site, which was tested and worked.

Before doing anything else I ran the config wizard on the second server and connected to the farm. At this point I had three servers listed in the Central Admin site, and it was time to configure services.

At this point we hit the snag – when I tried to configure the Office Server Search Service to run on the second server I got a SharePoint page telling me access was denied (‘The request failed with HTTP status 401: Unathorised’. There was a similar error in the event log with an event ID of 1314, and we also found an event log error with ID 5000.

I bashed my head against this for a while, checking user rights, group memberships and stuff. I checked the DCOM IIS WAMREG activation rights for the users that the app pools were running as and just in case did an aspnet_regiis -ga <username> for those accounts to ensure that all the .Net registrations and rights were correct. No success.

I removed SharePoint and reinstalled the farm with the roles reversed. The fault moved to the other server. I confirmed that I could configure the service on the same server as the central admin site but never on the other server. I looked at the system registry, compared service configurations with a working system and tried manually hacking the config to no effect.

In the end I uninstalled everything, installed the farm clean and unconfigured and called in air support.

The fix

I can’t praise our support guy at Microsoft enough. He’s incredible – I emailed him and got a phone call within five minutes! We ran through the problem and he consulted his support resources. What he came back with took a few goes to make stick, but it worked, and in fixing SharePoint pointed to the root of the problem.

The solution is to edit the web.config for the Office Server Web Services site. On our system that file is in C:\Program Files\Microsoft Office Servers\12.0\WebServices\Root. The original file looks like this:

<?xml version=”1.0″ encoding=”utf-8″?>
<configuration> 
    <configSections> 
        <sectionGroup name=”microsoft.office.server” type=”Microsoft.Office.Server.Administration.OfficeServerConfigurationSectionGroup, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c” > 
            <section name=”sharedServices” type=”Microsoft.Office.Server.Administration.SharedServiceConfigurationSection, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c” /> 
        </sectionGroup> 
    </configSections> 
    <system.web> 
        <authorization> 
            <allow roles=”.\WSS_ADMIN_WPG” /> 
            <deny users=”*” /> 
        </authorization> 
        <webServices> 
            <protocols> 
                <clear /> 
                <add name=”AnyHttpSoap” /> 
                <add name=”Documentation” /> 
            </protocols> 
        </webServices> 
    </system.web>
</configuration>

The solution is to edit the <authorization> section, adding entries to grant access to the user accounts for installation and db access:

        <authorization> 
            <allow roles=”.\WSS_ADMIN_WPG” /> 
            <allow users=”ondemand\MOSSdba” /> 
            <allow users=”ondemand\MOSSsetup” /> 
            <deny users=”*” /> 
        </authorization>

However, the gotcha is that SharePoint puts the settings back – don’t do an IISreset; don’t recycle the app pool. Simply edit the file then go the page to configure the search service and it works. Once you’ve done that the service will start.

I then found that I couldn’t get back into the page because the web.config got reset (grr), but that’s not important right now.

The cause

The key in all this is that the two users I added explicit rights for were members of the WSS_ADMIN_WPG group specified inthe original file. This pointed at an issue with the domain – the server was failing to get a list of members for that group.

The servers themselves were built and managed by our customer’s hosting provider, so I passed the fault to them. They checked the systems and found a domain fault affecting synchronisation. Result!

SharePoint 2007 on x64 – don’t try to run 32-bit web apps!

We’re slowly migrating services onto our new servers here at Black Marble. This morning we had one of those moments where significant amounts of wall kicking and teeth gnashing ensue.

Basically, we forgot that if you enable 32-bit .net support on IIS 6 it disables 64-bit support – you can’t run 32 bit and 64 bit apps concurrently.

We spent a long time the other week getting our release version of SharePoint 2007 installed on one of our shiny Sun X2100 x64 servers. We expect the site to be quite large, so it made sense to run the x64 version of SharePoint.

Unfortunately, when 32-bit .Net apps were enabled by mistake, SharePpint and the other 64-bit web apps all stopped. Removing .Net 1.1 and running the aspnet_iisreg -i command from the x64 .Net 2 framework folder got us back up and running, but SharePoint refused to allow anyone to login.

Fortunately the Central Admin site was still working, so I had a rummage. It looked like SharePoint was no longer talking to our AD, so I went to the Application Management site, and went to the Authentication Providers option in the Application Security section.

In here you can edit the settings for each Web Application. I went through each of ours, and clicked on the ‘Default’ zone which is listed in the Web Application page.

I didn’t need to change anything – simply hit the save button and SharePoint seemed to rewrite it’s settings. Once this was done, our SharePoint started talking to people again.

Now, I don’t expect you to hit the same crazy situation as we did, but it’s nice to know that you can coax SharePoint back into life without restoring stuff from backup.