Server Core, Hyper-V and VLANs: An Odyssey
A sensible plan
This is a torrid tale of frustration and annoyance, tempered by the fun of digging through system commands and registry entries to try and get things working.
We’ve been restructuring our network at Black Marble. The old single subnet was creaking and we were short of addresses so we decided to subnet with network subnets for physical, virtual internal and virtual development servers, desktops, wifi etc. We don’t have a huge amount of network equipment, and we needed to put virtual servers hosted on hyper-v on separate networks so we decided to use VLANs.
Our new infrastructure has one clever switch that can generate all the VLANs we need, link those VLANs to IP subnets and provide all the routing between them. By doing it this way we can present any subnet to any port on any switch with careful configuration and use of the 802.1Q VLAN standard. Hyper-V servers can have a single physical interface with traffic from multiple VLANs flowing across it to the virtual switch, with individual VMs assigned to specific VLANs.
We did the heavy lifting of the network move without touching our Hyper-V cluster, placing all the NICs of all the servers on the VLAN corresponding to our old IP subnet. We then tested VLANs over the virtual switch in Hyper-V using a separate server and made sure we knew how to configure the switch and Hyper-V to make it all work.
Then we came to the cluster. Running Windows 2008 R2 Server Core.
Since we built the cluster Andy and I have come to decide that if we ever rebuild it, server core will not be used. It’s just too darn hard to configure when you really need to, and this is one of those times.
A tricky situation
Before we began to muck around with the VLAN settings, we needed to change the default gateway that the servers used. The old default gateway was the address of our ISA (now a shiny TMG) server. That box is still there, but now we have the router at the heart of the network, whose address is the new default gateway.
To change the default gateway on server core we need a command line tool. Enter Netsh, stage left.
We first need to list the interfaces so we know what we’re doing. IPConfig will list the interfaces and their IP settings. Old lags will no doubt abbreviate the netsh commands that we need next but I’ll write them out in full so they make sense.
Give me a list of the physical network adapters and their connection status: netsh interface show interface
Show me the IPV4 interfaces: netsh interface ipv4 show interface
To change the default gateway we must issue a set command with all the IP settings – just entering the gateway will not work as all the current settings get wiped first:netsh interface ipv4 set address name="<name>" source=static address=x.x.x.x mask=255.255.255.0 gateway=x.x.x.x
Where
It’s worth pointing out that we stopped the cluster service on the server whilst we did this (changing servers one by one so we kept our services runnning).
We had two interfaces to change. Once corresponded to the NIC used to manage the server and the other corresponded to the one used by the virtual switch for Hyper-V. That accounted for two of the four NICs on our Sun X2200-M2’s, with the SAN iSCSI network taking a third. The SAN used a Broadcom, the spare was the other Broadcom and the others used each of the two nVidia NICs on the Sun (that will become important shortly).
A sudden problem
Having sorted the IP networking our next step was to sort out the VLAN configuration. To do that we changed the switch port that the NIC hosting the hyper-V virtual switch was connected to from being an untagged member of only our server subnet VLAN to being a tagged member of that VLAN and a tagged member of the new VLAN corresponding to our subnet for virtual internal servers.
The next step was to set the VLAN id for a test VM (we could ignore the host as it doesn’t share the virtual switch – it has it’s own dedicated NIC).
The snag was, the checkbox to enable VLAN ids was disabled when we looking in Hyper-V manager, both for the virtual switch and for the NIC in the VM.
Some investigation and checking of our test server showed that the physical network driver had a setting, Priority and VLAN, that needed to be set to enable priority and VLAN tagging of traffic, and that the default state was priority only. On full server that’s a checkbox in the driver setting. On server core…?
So, first of all we tried to find the device itself. Sadly, the server decided that remove management of devices from another server wasn’t going to be allowed, despite reporting that it should be. So we searched for command line tools.
To query the machine so it lists visible hardware devices: sc query type= driver
(note the space before 'driver')
That will give you a list, allowing you to find the network device name.
For our server, that came back with nvenetfd – the nVidia NIC.
To use that name and find the file responsible: sc qc <device name>
(in our case nvenetfd )
That returned the nvm62x64.sys driver file. Nothing about settings, but it allowed us to check the driver versions. Hold that thought, I’ll come back to it shortly.
Meanwhile
We’d also been poking at the test server, looking at the NIC settings. Logic suggested that the settings should be in the registry – all we have to do was find them.
I’ll save you the hunt:
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlClass{4D36E972-E325-11CE-BFC1-08002BE10318}
That key holds all the Network Adapters. There are keys beneath that are numbered (0000, 0001, etc). The contents of those keys enabled us to figure out which key matched which adapter. Looking at the test server and comparing it to the server- core hyper-v box we found a string value call *PriorityVlanTag which had a value of 3 on the test server (priority and vlan enabled) and 1 on the hyper-v box. We set the hyper-v box to 3. Nothing. No change. We rebooted. Still nothing.
Then we noticed that in the key for the NIC there was a subkey: Ndiparams
. In there was a key called *PriorityVlanTag. In that key were settings listing the options that were displayed in the GUI settings dialog, along with the value that got set. For the nVidia the value was 2, not 3. We duly changed the value and tried again. Nothing.
So we decided to update the drivers. This brings us back to where I left us earlier with the sc
command.
To update a driver on server core, you need to unpack the driver files into a folder and then run the following:pnputil –i –a <explicit path to driver inf file>
After failing to get any other drivers to install it looked like we had the latest version and the system was not letting go. So we did some more research on the internet (what did we ever do before the internet?).
It transpires, for those of you with Sun servers, that the nVidia cards appear not to support VLAN ids on traffic, despite having all the settings to suggest that they do.
Darn.
A way forward
Fortunately we have a spare broadcom on each of our hyper-v host servers. We are now switching the virtual switch binding from the nVidia to the broadcom on each of our servers. We didn’t even have to hack around with registry settings once we did that the VLAN id settings in Hyper-V simply sprang into life.
The moral of this story is that if you want to use VLAN id’s with Hyper-V and your server has nVidia network adapters (and certainly if it’s a Sun X2200-M2) then stop now before you lose your hair. You need to use another NIC, if you have one, or install one if you don’t. Hopefully, however, the command line tools and registry keys above will help other travellers who find themselves in a similar situation to ourselves.