Analysing Active Directory

I think I’ve mentioned before how I’ve been updating our IT infrastructure. Company growth has meant a need for expanded services. Add to that new versions of SharePoint and Exchange, mix in a need to run virtual servers for development and you have a need for more tin.

Over the past six months I’ve expanded our domain to keep pace with our growing needs. The number of physical servers we have has increased, with a few more virtual servers for specific roles that I prefer to keep separate but which don’t really merit their own box.

As part of this growth, I added a second domain controller. Our existing DC was also running Exchange 2003, and this situation has caused me the most headaches in the sliding block puzzle of service upgrade and migration: We couldn’t demote the DC on our old server because of Exchange 2003, but I was reticent about putting in Exchange 2007 until I had redundancy of critical services (DC, DNS, etc).

Updating Domains, getting ready for Exchange

I will admit at this point that my knowledge of AD is not as deep as I would like, although it is increasing daily. That does mean, however, that I check before I leap – find articles on MSDN, TechNet and the wider blogosphere to find the pitfalls so I avoid pratfalls.

So, I read carefully about raising the functional level of the Forest and Domain when installing a 2003 R2 domain, made sure everything was patched and service packed before starting, read and re-read the instructions. When confident I had run through all the prerequisites I ran dcpromo to add my domain controller.

I was then left with two servers, both of which had the necessary tools to mange AD, both of which were registered in DNS as DC’s, both of which appeared to be fine.

Nothing I read suggested that I needed to check anything else to make sure the process had completed… (You can see where this is going, can’t you…?)

Exchange 2007 – the big transition

Over the first weekend in April we transitioned from Exchange 2003 to Exchange 2007. Once again, I did my reading. I ran the Exchange Best Practice Analyser and made sure that our Exchange 2003 installation was in tip-top condition. I compared two or three different sets of instructions on how to run throughthe process, setting on one from an Exchange community site because of some extra little nuggets of insight it contained.

The transition went relatively smoothly. The new server went in, was configured correctly and the Exchange 2007 site was connected to the Exchange 2003 site. Mailboxes were transferred (we had a problem with one, but we fixed it) and clients were checked to have connected to the new server.

Once happy, we uninstalled the old Exchange, as per instructions.

It took a full day, but we were being careful and thorough. We thought it had gone fine.

The next step would be to remove our old DC from the AD and decommission the server. Being cautious, we wanted to test that things wouldn’t stop if we removed the old DC, so we unplugged the network cable…

Chaos!

Everything stopped – Exchange clients disconnected, logons stopped, everything!

Is there a doctor in the house?

Stage one when hit with a problem – gather as much information as possible.

We looked at our systems, we checked logs, we watched the Outlook clients connecting to exchange. When we disconnected our old DC, nothing seemed to want to talk to the new DC. I checked the Exchange server settings and made sure the server was set to use the new DC for its configuration and all seemed fine.

We noticed an error that the clients couldn’t connect to Global Catalogue server, so I did some more reading, realised that the old DC was our global catalogue server and so followed the steps to change the role over to the new DC. Everything said it had worked, but nothing changed.

I did some more reading about role masters and set the new DC to be the master for each role – at least I thought I did – through the AD users and groups tool. Still nothing.

At this point I decided that either I could spent days or weeks researching and prodding, or I could call in the cavalry. The support team we have access to as a Gold Partner are fantastic – I can never praise them enough – and sure enough I had people on the problem within an hour of logging the call.

Because we initially thought the problem was with our exchange config, we dealt with a very efficient Exchange support guy. He worked methodically through the problem, and started to look deeper into our domain and DC’s as he zeroed in on it being a domain issue.

At this point, I encountered the AD support tools being used in anger for the first time. I passed the support guys dozens of log files. We also discovered what appeared to be the problem – my new DC wasn’t really a DC!

That last statement is a bit too simplistic. Our new DC was happily replicating the AD. It reported everything being fine when examined with replmon. Both DC’s agreed on their view of the world.

What I didn’t know was that in addition to the AD replicas, a NETLOGON share is created on the new DC by dcpromo. I also did not know that this process had failed – at no point did anything tell me. Because there was no share, the server was not dealing with client requests correctly, which is why our systems had a fit when I unplugged the old DC.

Peering into a deep, dark well

Having identified the fault, my exchange guy called in an AD specialist to assist. He ably worked through the fault. There are a sequence of steps to follow which will trigger a rebuild of the netlogon share. We worked through them. They didn’t work. We knew they didn’t work because the share wasn’t created. Apart from a couple of event log messages which I didn’t consider to be helpful, nothing told us what was wrong.

Having failed to rebuild the share on the new DC, my AD ninja looked at the old DC. He decided to rebuild the same share on the existing DC, the thinking being that the replication was failing because of a fault on the source, rather than the destination. In order to do this, the domain group policies would be destroyed and rebuilt as defaults.

This process took some time, but to cut a very long story short, it appears that our default group policy objects were corrupted, which was blocking the replication. By deleting them and rebuilding the sysvol directory structure on our original DC, then forcing a rebuild on the new DC, the AD was fixed.

My eternal gratitude to the Microsoft support guys. My point, long and meandering though the journey has been, is this: At no point did I see anything which suggested corruption of those objects. At no point did I see anything which suggested they were the cause of the replication fault.

My toolbox is missing!

In order to get the information the support guys needed, I had to install first the Support Tools from the installation media and then the resource kit tools downloaded via the web. Those tools should have been installed by default, or at least should have been added when I created my new DC.

Even when I’d installed the tools, they didn’t really give me much information. Now, I will readily admit here that I am new to the tools, and continued reading will doubtless help me in this regard, but the key point is a simple one:

I can’t see what’s going on!

Shhh… say it quietly… NDS

I supported IT solutions including Novell servers for fifteen years before joining Black Marble. In my previous role we had some thirty servers with a fairly complex, but well structured NDS directory. Over those years, we had some problems with replication and corruption, and every time we did, we started with the same procedure: We watched.

What Active Directory is lacking, in my humble opinion, is an equivalent of the Novell DStrace tool. DSTrace allows you to watch the activity of your directory replicas. By careful use of the various options you can configure your servers to show you replication traffic, requests and responses and more. Colour coding allows you to spot errors and warnings and after a while you start to see patterns in the mass of text. If we had an NDS problem we could use DStrace to get a feel for the cause – you could see if there were corrupt objects which weren’t replicating between servers. You could even figure out which servers were right and wrong.

Once you’d seen the fault, the dsrepair tool allowed you to tackle it either with surgical precision or with heavy artillery. You could force a replication of an individual object, overwriting the corrupted copy by force, or use drastic measures like deleting a replica of the directory or a partition.

Where are those tools for active directory? If they exist, please tell me, because I’d like to get my hands on them. I can’t imaging dealing with huge installations of AD without that kind of toolset.

A wishlist…

What would I like to see then? I’m writing this post before I start rummaging around the web, and if I find examples of these tools I’ll post about them.

  1. A tool which checks the integrity of the directory and it’s objects, and identifies where replicas on different servers disagree.
  2. A tool that allows me to see all the AD traffic in real time – logging to a database might be useful, but just seeing the messages on screen would be a start. I want to be able to toggle different messages – errors, warnings, replication traffic, client requests and responses etc to get a feel for what works and what doesn’t.
  3. A tool to allow me to fix individual objects – to replace them from backup or to overwrite them with a copy from another replica (by far my preferred method).

If this lot already exists then tell me. If there are good books on the subject then point me at them. I’ve found some support articles which are helpful, but not as much as I’d like. I’m not precious – if this all stems from a fundamental misunderstanding or lack of knowledge on my part I’m happy to admit my mistake. However, at this point I’m leaning more to it being an indication that AD still hasn’t matured to the level of NDS in terms of management and control.

X2100 IPMI Redux – success!

In hindsight I should have thought of it, but even if I had, others got there first.

You may remember my problems with IPMI on our X2100 servers from an earlier posting. Today I had cause to revisit the matter, as we’re having terrible issues with the Nvidia RAID on one of our servers.

The lack of a Windows version of IPMItool is still a pain, but I am leagues closer to a usable solution now, thanks to Cygwin. The solution, it turns out, whilst somewhat laborious, is fairly straightforward. Simply build IPMItool under cygwin. Result!

Instruction are available on the ‘net and the IPMItool man page is on Sourceforge.

I can now query the SMDC board on my X2100s from Windows.

Windows Home Server – something for my father

On Saturday I got the email telling me that I’d been accepted onto the Home Server Beta 2. I’m excited about this product in a way that I haven’t been about new software solutions for a while.

I’ve taken part in beta programmes before. I’ve been around a while, and as an IT pro you get desensitised after a while. Vista has some innovative features, but it’s evolutionm, not revolution.

Home server is different.

To explain why, let me give you a bit of background: Being a geek, you’d expect my home to have a few PCs and you’d be right. I had a purge shortly after I got married, which reduced the number of active systems from eight (don’t ask!) to four – my home desktop, my wife’s home desktop, a media PC and a Mac Mini (which I use for web site testing and development). On top of that, we have a Netgear SC101 NAS box for shared storage, a networked printer and a photo printer attached to my wife’s PC.

My Grandmother has firmly embraced the information age. She has a desktop and a laptop. She sends emails all over the place and is slowly scanning all the photographs that the family has collected over the years. The desktop stays on all the time with a file share for the laptop.

My parents have a computer each. They also have a Netgear SC101 and a coulpe of printers. In addition, my father has a laptop.

Particularly for my parents and grandmother, the Home Server will be a perfect match to requirements. A black box that can back up systems, is easy to manage and allows file and printer sharing – great!

Being the defacto tech support for my family, the opportunity to put one system in each home that can do automatic backups and store all the important files safely is extremely welcome. I’m looking forward to getting my Home Server beta up and running and if it works like the documentation suggests, there’ll be three customers lining up for a copy when it’s released.

Vista Upgrade – attempts 4, success 0

I have yet to succeed in upgrading from Windows XP to Windows Vista. Each time it runs through to the completing upgrade phase, gets about halfway through that bit whereupon I get stuck in a reboot cycle.

I have tried this now on three separate machines and two different installed partitions on one of them.

Two of the machines were Shuttle SN25G2 SFF boxes with Nforce 2 motherboards and the onboard nforce 2 (basically a geforce 2) video.

One of them was an Acer E360, an nforce 3 chipset box with an Nvidia 6600GT display card.

On the OS front, the Shuttles ran XP Pro SP2, fully patched; the Acer has the XP MCE that it came with, and an XP Pro SP2 install.

I’m starting to wonder if the common denominator here is Nvidia. In spite of the fact that I spent a long time with my Acer stripping off drivers and applications and repeatedly trying the upgrade I have not managed a successful upgrade. Has anybody managed to upgrade an Nforcex system?

What I will say, having now lost days of my life to failed upgrades, is that the Upgrade Rollback feature of Vista is fantastic! A no messing, works every time, put it back to how you found it option that takes only a few minutes. Wonderful!

So, now I’m going to look into the recently-release Windows Easy Transfer Companion as a way to get my applications across onto Vista.

Why do I need to do that? Because Acer, like so many other manufacturers these days, provides no installation media for the applications they ship with the computer. Unless I want to shell out again for things like PowerDVD and NTI CD-Maker I need to either upgrade (been there, tried that), hack the cached installed files (also tried, and failed) or use a magic bullet (see above). I’ll let you know how I get on with that one.

SharePoint 2007 on x64 – don’t try to run 32-bit web apps!

We’re slowly migrating services onto our new servers here at Black Marble. This morning we had one of those moments where significant amounts of wall kicking and teeth gnashing ensue.

Basically, we forgot that if you enable 32-bit .net support on IIS 6 it disables 64-bit support – you can’t run 32 bit and 64 bit apps concurrently.

We spent a long time the other week getting our release version of SharePoint 2007 installed on one of our shiny Sun X2100 x64 servers. We expect the site to be quite large, so it made sense to run the x64 version of SharePoint.

Unfortunately, when 32-bit .Net apps were enabled by mistake, SharePpint and the other 64-bit web apps all stopped. Removing .Net 1.1 and running the aspnet_iisreg -i command from the x64 .Net 2 framework folder got us back up and running, but SharePoint refused to allow anyone to login.

Fortunately the Central Admin site was still working, so I had a rummage. It looked like SharePoint was no longer talking to our AD, so I went to the Application Management site, and went to the Authentication Providers option in the Application Security section.

In here you can edit the settings for each Web Application. I went through each of ours, and clicked on the ‘Default’ zone which is listed in the Web Application page.

I didn’t need to change anything – simply hit the save button and SharePoint seemed to rewrite it’s settings. Once this was done, our SharePoint started talking to people again.

Now, I don’t expect you to hit the same crazy situation as we did, but it’s nice to know that you can coax SharePoint back into life without restoring stuff from backup.