Virtually Awesome: Yet Another VSphere Blog!

Virtual Machine cpu options and NUMA

4/8/2014

Quite often I am asked about the impact of selecting different Virtual Machine vCPU layouts via the VM CPU options. For example, what is the outcome of selecting 2 VM processors with 1 core versus 1 VM processor with 2 cores? Let's have a look at the options in question...

In the above example we are creating a dual processor VM and we could select 1 vCPU with 2 cores or 2 vCPU with one core. Does it make a difference which we choose?

No. At least in this case, specifically where we have less than 8 vCPU. More about this later.

Changing the core count is a useful way of overcoming potential guest application licensing issues, as discussed here: http://kb.vmware.com/kb/1010184. In terms of performance however, the hypervisor will schedule vCPU on underlying Physical CPU in exactly the same way in both cases, so there is no performance enhancement. But is this ALWAYS true?

Not necessarily. To complicate matters we have something called NUMA and vNUMA (Non Uniform Memory Access)

Modern CPU are ferociously powerful and have no problem number crunching vast quantities of instructions. Our problem is getting those instructions to the CPU from main memory, which is already busy servicing other CPUs. The memory bus has just become a bottleneck. To compensate, NUMA systems (such many from IBM) break down CPU and Memory into several blocks or nodes with each node having its own dedicated access to memory. Each NUMA node will have a certain number of cores and a certain amount of local memory.

Dual Socket, Quad Core split into two NUMA nodes.

The general upshot of this is that the VMKernel is NUMA aware and will attempt to schedule a VM within a single NUMA node as it is more efficient from a memory management perspective. It is therefore beneficial if your VM vCPU count AND memory allocation is able to fit into a single NUMA node.

If this is not feasible due to application requirements (the VM needs more CPU and memory than is contained in a single node) then not to fear, the hypervisor will split the VM across multiple nodes quite happily (referred to as a "wide VM support"). It is worth mentioning at this point that before VSphere 4.1 if a VM was larger than its node then NUMA was disregarded entirely and any advantages of NUMA lost.

Assuming that your VM is the right size for the job, it will perform better operating as a "wide-vm" than if it were down-sized to fit within a single NUMA node. The image below compares a "local" to a "wide" VM. Both examples are acceptable although the VM with 4 vCPU will have greater memory locality than the 6 vCPU machine.

But what about our original question of vCPU layout? Well, now we have to consider vNUMA!

If a VM has 8 or more vCPU then vNUMA is enabled by default. vNUMA surfaces a version of the NUMA architecture up to the Guest OS and allows it to make scheduling decisions based upon the revealed topology. It is easier for vNUMA to present an appropriate topology to your VM if the vCPU layout is flat, that is to say, one core per vCPU (which is the default).

If on the other hand you MUST have multiple cores per vCPU (due to licensing) then it is best to mimic the physical NUMA layout of your host. Manually applied core to socket configurations will override vNUMA and may or may not match the physical ESXi NUMA topology thus resulting in degraded performance due to mismatching. This has been shown in tests to be as much as 31%.

So to summarise!

1.) Try to size VM vCPU and memory as a divisor of a NUMA Node unless application demands rule this out.

2.) vNUMA is enabled by default in VMs that have 8 or more vCPU. It is better to keep the CPU layout of these VMs "flat".

3.) If this is not possible, design the processor and core count to match your host NUMA topology.

This post has used "Broad Strokes" to cover a fairly complex subject and will not be appropriate in every situation. For further information related to this post, please see the below resources.

vsphere 5.1 & 5.5 port mirroring

4/1/2014

0 Comments

After some questions from curious students about the exact nature of the port mirroring options within a 5.1 or 5.5 Distributed Switch, I thought that I would make them the subject of my first post!

As a quick recap, Port Mirroring is the process of copying network packets from a specific source (switch port or VLAN) to a monitoring device on another port. The nature and location of the receiving device can vary (as will be discussed) and the replicated packets themselves are usually fed into troubleshooting or sniffing software (such as Wireshark) or perhaps network intrusion detection systems (IDS).

With VSphere 5.1 we had some additional options appear within the web client. If we fire it up, we see the following.

But what do those options actually do? When should we use one and not the other? Essentially VMware have added support for RSPAN and ERSPAN along with SPAN. SPAN is a switch feature from Cisco and translates as Switch Port ANalyser. These techniques provide us with the following functionality:

SPAN: This is local. Replicating of traffic from one port on a switch to another port on the same switch.

RSPAN: Remote SPAN. Replicating packets from one or more switch ports on one or more switches to a remote switch. This allows for centrally monitoring traffic from several locations on one device. Mirrored packets must be carried in a dedicated VLAN created for this purpose.

ERSPAN: Encapsulated Remote SPAN. Allows capturing data from several sources (switch/port) and mirroring that data to a target IP address.

So, getting to the point at last - what does this have to do with the options presented in the Vsphere Distributed Virtual Switch? Lets have a look at those VDS Mirroring options again.

Distributed Port Mirroring: This is the equivalent of SPAN. It enables us to mirror packets between VMs within the same ESXi host. No physical switch configuration is required, however, if either of the machines were to change host (through vMotion) then the mirroring session fails.

Remote Mirroring Source: This is the equivalent of RSPAN. It allows several Virtual Machines across ESXi hosts to have packets mirrored to a centralized physical receiver connected to a physical switch. As such, it requires physical switch configuration as well as the creation of a VDS mirroring session.

Remote Mirroring Destination: Allows you to centrally monitor the RSPAN VLAN traffic from a monitoring virtual machine running on an ESXi Host. This also requires physical switch configuration AND the configuration of the VDS mirroring session.

Encapsulated Remote Mirroring (L3) Source: This is the equivalent of ERSPAN. It enables the mirroring of a VDS VLAN and the transmission of that data to the IP address of a specific target. Naturally this enables sending data between domains and does not require physical switch configuration.

Distributed Port Mirroring (Legacy): This follows the behaviour of the old 5.0 switch. Packets can be mirrored to a destination VM on the same DVS or to a destination on the same physical switch by selecting an Uplink destination.

Hopefully this helped at least a little. Keep tuned for future posts!

0 Comments

Virtual Machine cpu options and NUMA

vsphere 5.1 & 5.5 port mirroring

Author

Archives

Categories