Thursday 23 February 2012

Digging deeper into the VDS construct

Original Post: http://www.yellow-bricks.com/2012/02/23/digging-deeper-into-the-vds-construct/

The following comment was made on my VDS blog and I figured while would investigate this a bit further:

It seems like the ESXi host only tries to sync the vDS state with the storage at boot and never again afterward. You would think that it would keep trying, but it does not.

Now lets look at the "basics" first. When an ESXi host boots it will get the data required to recreate the VDS structure locally by reading /etc/vmware/dvsdata.db and from esx.conf. You can view the dvsdata.db file yourself by doing:

net-dvs -f /etc/vmware/dvsdata.db

But is that all that is used? If you check the output of that file you will see that all data required for a VDS configuration to work is actually stored in there, so what about those files stored on a VMFS volume?

Each VMFS volume that holds a working directory (place where .vmx is stored) for at least 1 virtual machine that is connected to a VDS will have the following folder:

drwxr-xr-x    1 root     root          420 Feb  8 12:33 .dvsData

If you go to this folder you will see another folder. This folder appears to be some sort of unique identifier, and when comparing the string to the output of "net-dvs" it appears to be the identifier of the dvSwitch that was created.

drwxr-xr-x    1 root     root         1.5k Feb  8 12:47 6d 8b 2e 50 3c d3 50 4a-ad dd b5 30 2f b1 0c aa

Within this folder you will find a collection of files:

-rw------- 1 root root 3.0k Feb 9 09:00 106  -rw------- 1 root root 3.0k Feb 9 09:02 136  -rw------- 1 root root 3.0k Feb 9 09:00 138  -rw------- 1 root root 3.0k Feb 9 09:05 152  -rw------- 1 root root 3.0k Feb 9 09:00 153  -rw------- 1 root root 3.0k Feb 9 09:05 156  -rw------- 1 root root 3.0k Feb 9 09:05 159  -rw------- 1 root root 3.0k Feb 9 09:00 160  -rw------- 1 root root 3.0k Feb 9 09:00 161

It is no coincidence that these files are "numbers" and that these numbers resemble the port ID of the virtual machines stored on this volume. This is the port information of the virtual machines which have their working directory on this particular datastore. This port info is also what HA uses when it needs to restart a virtual machine which uses a dvPort. Let me emphasize that, this is what HA uses when it needs to restart a virtual machine! Is that all?

Well I am not sure. When I tested the original question I powered on the host without access to the storage system and powered on my storage system when the host was fully booted. I did not get this confirmed, but it seems to me that access to the datastore holding these files is somehow required during the boot process of your host, in the case of "static port bindings" that is. (Port bindings are more in-depth described here.)

Does this imply that if your storage is not available during the boot process virtual machines cannot connect to the network when they are powered on? Yes that is correct, I tested it and when you have a full power-outage and your hosts come-up before your storage you will have a "challenge". As soon as the storage is restored you probably will want to restart your virtual machines but if you do you will not get a network connection. I've tested this 6 or 7 times in total and not once did I get a connection.

As a workaround you can simply reboot your ESXi hosts. If you reboot the host the problem is solved and your virtual machines can be powered on and will get access to the network. Rebooting a host can be a painfully slow exercise though, as I noticed during my test runs in my lab. Fortunately there is a really simple workaround: restarting the management agents! Before you power-on your virtual machines and after your storage connection has been restored do the following from the ESXi shell:

services.sh restart

After the services have been restarted you can power-on your virtual machines and network connection will be restored!

Side note, on my article there was one question about the auto-expand property of static port groups and whether this was officially supported and where it was documented. Yes it is fully supported. There's a KB Article about how to enable it and William Lam recently blogged about it here. That is it for now on VDS…

No comments:

Post a Comment