# The Data Science Blog

### Blog of Gary Short

One of the things that makes data science hard is that it has a foundation in statistics, and one of the things that makes statistics hard is that it can run counter-intuitively. A great illustration of that is the Monty Hall Problem. Month Hall was a US game show host who presented a show called “Let’s Make a Deal” and the Monty Hall Problem is modelled on that show; it goes something like this:

As a contestant, you are presented with three doors. Behind one of the doors is a car and behind the other two there are goats. You are asked to pick a door. Monty will then open one of the other doors, revealing a goat and will then ask you if you want to swap your pick to the remaining closed door. The dilemma is, should you stick or swap, and does it really matter?

When faced with this dilemma, most people get it wrong, mainly because the correct answer runs counter to intuition. Intuition runs something like this:

There are three doors, behind one of the doors there is a car. Each door is equally likely, therefore there is a 1/3 chance of guessing correctly at this stage.

Monty then opens one of the remaining doors, showing you a goat. There are now two doors left, one containing a car, the other a goat, each are equally likely, therefore the chances of guessing correctly at this stage are 50:50, but since the odds are equal there is no benefit, nor harm, in switching and so it doesn’t matter if you switch or not.

That’s the intuitive thinking, and it’s wrong. In fact if you switch, you will win twice as often as you lose. What! I hear you say. (Literally, I can hear you say that). Yeah I know, it’s hard to believe isn’t it? See, I told you it was counter-intuitive. So, let me prove it to you.

Firstly, let’s state some assumptions. Sometimes these assumptions are left implied when the problem is stated, but we’ll make them explicit here for the purposes of clarity. Our assumptions are:

1. 1 door has a car behind it.
2. The other 2 doors have goats behind them.
3. The contestant doesn’t know what is behind each door.
4. Monty knows where the car is.
5. The contestant wants to win the car not the goat. (Seems obvious but, you know…)
6. Monty must reveal a goat.
7. If Monty has a choice of which door to open, he picks with equal likelihood.

Now let’s have a concrete example to work through. Let’s say you are the contestant and you pick door 1, then Monty shows you a goat behind door 2. The question now is should you swap to door 3 or stick with door 1, and I say you’ll win twice as often as you’ll lose if you swap to door 3.

Let’s use a tree diagram to work through this example, as we have a lot of information to process:

There’s a couple of variables we have to condition for, where the car is, and which door Monty shows us, it’s that second condition that intuition ignores, remember Monty *must* show us a goat. So looking at the diagram we can see we choose door 1 and the car can be behind doors 1, 2 or 3 with equal probability; so, 1/3, 1/3, 1/3.

Next, let’s condition for the door Monty shows us. So if we pick 1 and the car is behind 1, he can show us doors 2 or 3, with equal likelihood, so we’ll label them 1/2, 1/2. If we pick 1 and the car is behind door 2, Monty has no choice but to show us door 3, so we’ll label that 1, and finally, if we pick 1 and the car is behind 3 then Monty must show us door 2, so again, we’ll label that 1.

Now, we said in our example that Monty shows us door 2, so we must be on either the top branch or the bottom branch (circled in red). To work out the probabilities we just multiply along the branches, so the probability of the first branch is 1/3 X 1/2 = 1/6 and on the bottom branch it’s 1/3 X 1 = 1/3.  Having done that, we must re-normalise so that the arithmetic adds to 1, so we’ll multiply each by 2, giving us 1/3 and 2/3, making 1 in total.

So now, if we just follow the branches along, we see that if we pick door 1 and Monty shows us door 2, there is a 2/3 probability that the car is behind door 3 and only a 1/3 probability that it is behind door 1 so we should swap and if we do so, we’ll win twice as often as we lose.

The good thing about living in the age of computers is that we now have the number crunching abilities to prove this kind of thing by brute force. Below is some code to run this simulation 1,000,000 times and then state the percentage of winners:

```using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication2
{
// Define a door
internal class Door
{
public bool HasCar { get; set; }
public int Number { get; set; }
}

public class Program
{
public static void Main()
{
// Create a tally of winners and losers
List<int> tally = new List<int>();

// We'll need some random values
var rand = new Random(DateTime.Now.Millisecond);

// Run our simulation 1,000,000 times
for (int i = 0; i < 1000000; i++)
{
// First create three numbered doors
List<Door> doors = new List<Door>();
for (int j = 1; j < 4; j++)
{
doors.Add(new Door { Number = j });
}

// Randomly assign one a car
doors[rand.Next(0, 3)].HasCar = true;

// Next the contestant picks one
Door contestantChoice = doors[rand.Next(0, 3)];

// Then Monty shows a goat door
Door montyChoice = doors.Find(x =>
x != contestantChoice && !x.HasCar);

// Then the contestant swaps
contestantChoice = doors.Find(x =>
x != contestantChoice && x != montyChoice);

// Record a 1 for a win and a 0 for a loss
}

// state winners as a percentage
Console.WriteLine(tally.Count(x =>
x == 1) / (float)tally.Count() * 100);
}
}
}
```

When I run this code on my machine (YMMV) I get the following result:

Which is pretty much bang on what we predicted.

Well that’s all for this post, until next time, keep crunching those numbers.

# But it works on my PC!

### The random thoughts of Richard Fennell on technology and software development

The new Yorkshire Extreme Programing Club seems to be getting off to a good start. The first meeting was well attended and there is activity on the message board.

Take a look at http://www.extremeprogrammingclub.com/

I recently had one of our Windows 2003 server lose it's disk mirrors and locked up. When it was restarted it has two (virtually idenitical) drives C: and E:. It booted off the primary mirror disk (C:) and all seemed OK except SQL.

I also tried booting off the secondary mirror (E:) but this would not boot (this drive it turns out had some bad blocks).

So I went back to the primary disk. The actual problems was SQL server started but then stopped after a few seconds, the Windows error log showed the unhelpful 3414 error. I google for this, but all that was mentioned was issues with DTC, but this did not relavent as we not use distributed transactions. There was nothing else on the web of note.

I had a look at the MSQL.1\logs directory and this showed problems loading the various databases. So it seems when the disk de-mirrored it was writing SQL transaction logs, and they ended up corrupted. So in my case a generic 3414 error in the error log meant corrupt transactions that could not be rolled forward or back.

More in hope than expectations I tried copying the SQL datafiles and logs back from the faulty secondary drive (E:) and tried to restart SQL and this worked - SQL started without a problem! I was lucky the bad blocks were not near the SQL files. This saved me from having to rebuild the server and restore backups, espcially as some the the DBs were SharePoint, and a SharePoint SQL restore is rarely fun!

If installing the Cassini Personal Server on a PC you will often get the "Cassini managed web server failed to start listening to port 80. Possible conflict with another web server on the same port." error.

You of course think this is a firewall, other web server or anti virus port blocker problems

IT IS NOT!

Ok it might be those problems as well but usually it is that you need to run

gacutil /i c:\cassini\cassini.dll

or just drag a copy of the cassini.dll into the GAC (C:\Windows\Assembly)

Shame the installer does not do this.

