One of the features many people may not be aware of that was released in vSphere 5 is Multiple-NIC vMotion. This is a feature that allows you to load balance a single or multiple vMotion transmissions over multiple physical NIC’s. This is of significant benefit when you’ve got VM’s and hosts with large amounts of memory, as vMotion migrations will complete significantly faster. So your Business Critical Applications with large amount of memory and CPU’s can now migrate without disruption even faster. Below I’ll briefly cover the good and great of this technology and also a gotcha that you need to be aware of.
I thought we’d start with the good news. With vSphere 5 you can now split single or multiple vMotion streams over multiple NIC’s. Up to 4 x 10Gb/s NIC’s or 16 x 1Gb/s NIC’s are supported. This magnifies even further the already impressive 30% improvement in vMotion performance vs vSphere 4.1.
It is super easy to set up multi-NIC vMotion. It’s all explained in KB 2007467. To briefly cover the set up.
- You set up multiple vmkernel port groups, each with a different NIC as primary, any other NIC’s as standby or unused, and a different IP address on the same subnet, .
- You then select the vMotion tick box on the vmkernel port.
Very simple. Now single vMotion’s and multiple concurrent vMotions will be load balanced over the NIC’s. There is absolutely not need to configure any complicated LACP or IP Hash load balancing to make this work, there is no need to use Load Based Teaming (Route based on physical NIC load). You can use this with standard switches, no need for distributed switch. It doesn’t even require Enterprise Plus licenses, but as the benefits are mostly with VM’s and hosts with lots of RAM you’re probably going to have Enterprise Plus anyway.
I tested performance of Multi-NIC vMotion with 2 x 10Gb/s NIC’s in my home lab and got almost 18Gb/s when using Jumbo Frames on vSphere 5. Hosts go into maintenance mode so fast you better not blink! I haven’t retested Multi-NIC vMotion again since upgrading to vSphere 5 U1 and the latest patches. I plan to test it when Update 2 or the next vSphere release comes out.
Here is the test results from my previous article. You can see the Multi-NIC vMotion Test at the bottom – vMotion 2 x 10G.
There is a condition that may occur during long running vMotion operations that could cause all hosts ports configured for vMotion to be flooded with the vMotion traffic (on vSphere 5.0 prior to Update 2). The way I understand it this occurs when physical switches MAC tables start timing out the MAC’s (before the ARP timeout). The reason it occurs is because although the outbound traffic is split over multiple vmkernel ports and multiple NIC’s the ACK’s coming back from one MAC. So after a while the physical network may time out the other MAC’s as it’s not seeing any traffic from them. As the transmissions are still occurring the switches may start flooding every port that is configured for the vMotion VLAN. Because the problem is generated by MAC timeouts around the 5 minute mark you will be more likely to experience this problem with 1G vMotion NIC’s or with 10G vMotion NIC’s that have Network IO Control or QoS limits imposed, as your migrations will generally take longer.
To work around this problem you may be able to adjust the MAC timeout values on your switches, depending on the type of switches you’ve got. The default MAC timeout on Cisco switches is normally 5 minutes. On the Dell 8024 10G Base T switch I’ve got in my lab the Address Aging value defaults to 301 seconds and is adjustable. Be careful if you choose to adjust these values as there may be other consequences, any adjustments should be tested, and only applied to the switches connecting directly to your vSphere Hosts carrying the vMotion VLAN.
VMware is aware of this problem and
is working on a fix has released a fix as part of vSphere 5.0 U2. The fix didn’t make it into ESXi 5 Patch 03 that was released on 13/07/2012 (07/12/2012 for those in the USA). I would hope that it makes it into the next vSphere 5 update release. I will have updated this article when now the problem is fixed, and let you know what patch or updates you need to apply. Until then I hope you are able to make use of Multi-NIC vMotion by applying the above workaround. At least configure it in your test environments and see how it goes.
Update (20130103): This issue is fixed on vSphere 5.0 U2. All you need to do is update your hosts to 5.0 U2 and this problem will be resolved. The workaround is no longer necessary.
I have just posted a workaround to this Gotcha in an article Workaround for Multi-NIC vMotion Unicast Flooding in vSphere 5. This workaround however is unsupported. So use at your own risk. It appears to work well on vSphere 5.0, 5.0 U1, and should work with 5.1 GA but hasn’t been tested.
If you thought vMotion in vSphere 5 was already fast you ain’t seen nothing yet, till you’ve experienced Multi-NIC vMotion. Even with this slight gotcha it still has some benefits if you can apply the workaround in your environment. Especially with very large VM’s >96GB RAM, and large hosts >256GB RAM, it will significantly help your migration times.
Duncan Epping at Yellow Bricks has done a follow up article on this titled Clearing up a misunderstanding around CPU throttling with vMotion and Multi-NIC vMotion in vSphere 5. I would highly recommend that you read it. As noted in the comments on this article this should not be kicking in under normal circumstances and will only kick in if the vMotion would have otherwise failed. I’ll let you read Duncan’s article for the full story.
This article is also posted at the VMware Blog Site – Support Insider.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.