One of the things that makes data science hard is that it has a foundation in statistics, and one of the things that makes statistics hard is that it can run counter-intuitively. A great illustration of that is the Monty Hall Problem. Month Hall was a US game show host who presented a show called “Let’s Make a Deal” and the Monty Hall Problem is modelled on that show; it goes something like this:

As a contestant, you are presented with three doors. Behind one of the doors is a car and behind the other two there are goats. You are asked to pick a door. Monty will then open one of the other doors, revealing a goat and will then ask you if you want to swap your pick to the remaining closed door. The dilemma is, should you stick or swap, and does it really matter?

When faced with this dilemma, most people get it wrong, mainly because the correct answer runs counter to intuition. Intuition runs something like this:

There are three doors, behind one of the doors there is a car. Each door is equally likely, therefore there is a 1/3 chance of guessing correctly at this stage.

Monty then opens one of the remaining doors, showing you a goat. There are now two doors left, one containing a car, the other a goat, each are equally likely, therefore the chances of guessing correctly at this stage are 50:50, but since the odds are equal there is no benefit, nor harm, in switching and so it doesn’t matter if you switch or not.

That’s the intuitive thinking, and it’s wrong. In fact if you switch, you will win twice as often as you lose. What! I hear you say. (Literally, I can hear you say that). Yeah I know, it’s hard to believe isn’t it? See, I told you it was counter-intuitive. So, let me prove it to you.

Firstly, let’s state some assumptions. Sometimes these assumptions are left implied when the problem is stated, but we’ll make them explicit here for the purposes of clarity. Our assumptions are:

1. 1 door has a car behind it.

2. The other 2 doors have goats behind them.

3. The contestant doesn’t know what is behind each door.

4. Monty knows where the car is.

5. The contestant wants to win the car not the goat. (Seems obvious but, you know…)

6. Monty must reveal a goat.

7. If Monty has a choice of which door to open, he picks with equal likelihood.

Now let’s have a concrete example to work through. Let’s say you are the contestant and you pick door 1, then Monty shows you a goat behind door 2. The question now is should you swap to door 3 or stick with door 1, and I say you’ll win twice as often as you’ll lose if you swap to door 3.

Let’s use a tree diagram to work through this example, as we have a lot of information to process:

There’s a couple of variables we have to condition for, where the car is, and which door Monty shows us, it’s that second condition that intuition ignores, remember Monty *must* show us a goat. So looking at the diagram we can see we choose door 1 and the car can be behind doors 1, 2 or 3 with equal probability; so, 1/3, 1/3, 1/3.

Next, let’s condition for the door Monty shows us. So if we pick 1 and the car is behind 1, he can show us doors 2 or 3, with equal likelihood, so we’ll label them 1/2, 1/2. If we pick 1 and the car is behind door 2, Monty has no choice but to show us door 3, so we’ll label that 1, and finally, if we pick 1 and the car is behind 3 then Monty must show us door 2, so again, we’ll label that 1.

Now, we said in our example that Monty shows us door 2, so we must be on either the top branch or the bottom branch (circled in red). To work out the probabilities we just multiply along the branches, so the probability of the first branch is 1/3 X 1/2 = 1/6 and on the bottom branch it’s 1/3 X 1 = 1/3. Having done that, we must re-normalise so that the arithmetic adds to 1, so we’ll multiply each by 2, giving us 1/3 and 2/3, making 1 in total.

So now, if we just follow the branches along, we see that if we pick door 1 and Monty shows us door 2, there is a 2/3 probability that the car is behind door 3 and only a 1/3 probability that it is behind door 1 so we should swap and if we do so, we’ll win twice as often as we lose.

The good thing about living in the age of computers is that we now have the number crunching abilities to prove this kind of thing by brute force. Below is some code to run this simulation 1,000,000 times and then state the percentage of winners:

using System; using System.Collections.Generic; using System.Linq; namespace ConsoleApplication2 { // Define a door internal class Door { public bool HasCar { get; set; } public int Number { get; set; } } public class Program { public static void Main() { // Create a tally of winners and losers List<int> tally = new List<int>(); // We'll need some random values var rand = new Random(DateTime.Now.Millisecond); // Run our simulation 1,000,000 times for (int i = 0; i < 1000000; i++) { // First create three numbered doors List<Door> doors = new List<Door>(); for (int j = 1; j < 4; j++) { doors.Add(new Door { Number = j }); } // Randomly assign one a car doors[rand.Next(0, 3)].HasCar = true; // Next the contestant picks one Door contestantChoice = doors[rand.Next(0, 3)]; // Then Monty shows a goat door Door montyChoice = doors.Find(x => x != contestantChoice && !x.HasCar); // Then the contestant swaps contestantChoice = doors.Find(x => x != contestantChoice && x != montyChoice); // Record a 1 for a win and a 0 for a loss tally.Add(contestantChoice.HasCar ? 1 : 0); } // state winners as a percentage Console.WriteLine(tally.Count(x => x == 1) / (float)tally.Count() * 100); } } }

When I run this code on my machine (YMMV) I get the following result:

Which is pretty much bang on what we predicted.

Well that’s all for this post, until next time, keep crunching those numbers.