Installing Hortonworks Sandbox 1.3 VM, on Hyper-V, Under Windows 8 And Configuring it to Connect to The Internet

Having won this week’s award for the longest blog title here at Black Marble, I’d better get on with the actual post. 

Every data scientist is going to have to work with large data files at some point in their careers, and right now, the de facto standard for doing so is Hadoop. There are lots of ways to gain access to Hadoop, from complete “roll your own” solutions, right up to pre packaged and ready to go solutions from people like Hortonworks.

For the purposes of this post we are going to take the easiest possible route to getting up and running with Hadoop, and that’s to use the “sandbox” VM from Hortonworks. So what is this sandbox? Well according to the Hortonworks site it’s:

A personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. All packaged up in a virtual environment that you can get up and running in 15 minutes!

And that’s pretty accurate to be honest, if you discount the download time of the 1.9Gb VM of course.

The first thing you’re going to want to do is to ensure that you have Hyper-V installed on your Windows 8 machine. To do that, hit the Windows key and search for “Programs and Features”, then select the app of that name from the results:

image

Over on the left hand side, you’ll see a link entitled “Turn Windows features on or off”, click on that and from the dialog that appears, either check, or ensure checking of, the box beside “Hyper-V”, click on okay, and you’re done.

Having done that, trot across to the Hortonworks website and download the sandbox VM, by clicking on the “Get Sandbox” button…

image

then selecting the VM of your choice…

image

We’re going to take the Hyper-V one. Once you have the zip file downloaded, unzip it to a convenient location.

Now, the instructions for installing the VM tell you to create a internal virtual switch within Hyper-V, and that’s okay to get things up and running and for experimentation. However, the sandbox comes with a series of tutorials, which are updated by Hortonworks from time to time, and there’s a button to allow you to download the latest version of them, so we’re going to want to configure our VM to be able to connect to the Internet. To do that, open Hyper-V Manager, and click on the “Virtual Switch Manager” link under “Actions”, on the right hand side of the window:

image

From there, go ahead and create a new “External Virtual Switch”…

image

And bridge it with the NIC you have connected to the Internet. As you can see, I’ve got mine connected to my WiFi, and that’s going to cause us a little problem that we’re going to have to fix later on, but I’ll come to that…

image

Next, from the “Action” menu, select “Import Virtual Machine” and navigate to the location where you extracted the download…

image

Step through the wizard, and when you have done so, you’ll have the VM installed…

image

Now we’re going to start up the VM and configure it to access the Internet…

image

By default, the VM is configured to have a static IP address of 192.168.56.101, which isn’t any use to us. We’ll have to configure our VM for Internet access, so go ahead and follow the instructions on the screen and press <Alt+F5> to log into the machine.

image

The default credentials for the sandbox VM are UID: root PWD: hadoop.

Having logged in, we want to edit the config file at

/etc/sysconfig/network-scripts/ifcfg-eth0 and set it up for DHCP…

image

And that’s where things get interesting. If you have a wired Internet connection then you are probably going to be okay; if, like me, you want to use a WiFi connection, then things get tricky. Virtualised WiFi connectivity is problematic and it proved to be so in my case. Now, in theory, all we have to do is to edit the file to have the following settings:

DEVICE=eth0
ONBOOT=yes
BOOTPROT=dhcp
NAME=”System eth0”

then that should be an end to it, and to be fair, when I’m in the office, that works perfectly with the office DHCP server issuing me with a 10.X address. However, when I come home, and I plug my laptop into my home network, the VM can’t seem to get an IP address. This is caused by an issue deep in the WiFi stack that I neither understand nor know how to fix – if it even can be fixed. As you can see from the screen shot above, I simply start my VM and when I’m in the office I comment out the parts required to set a static IP address, and when I’m at home, I comment out the DHCP parts, save the file, reboot the VM and I’m connected to the Internet. Confirm this by pinging some known URL…

image

Now that we have our sandbox VM connected to the Internet, we can point a browser at the VM’s IP address (either the one given by your DCHP server, and shown in the VM launch screen, or the one you set statically, if you have the same issue as I do at home)…

image

Hit the “Start” button to run the tutorials…

image

and click the “Update” button, to ensure you have the “latest and greatest” version of the tutorials…

image

And that’s it, you’re now ready to work your way through the tutorials and become an Hadoop expert!

Well that’s all for this post, until next time, happy number crunching!