23 Responses

  1. joshatwell
    joshatwell at |

    Great Post! I was wondering if you had seen any issue or had any additional information about clock speed throttling when doing large number of vMotions as mentioned by Eric Siebert http://de.twitter.com/ericsiebert/status/21051008

    I know in our environment with 10G interfaces when we put hosts in maintenance mode one of our monitoring applications shows huge spikes in response time to those VMs. Naturally it is safe to say that response time might be impacted. Who are we kidding, there are no pings dropped, but the impact on latency sensitive applications is certainly evident.

    I haven't yet had an opportunity to dig into this deeper so thought I would see if you might already have additional information on the subject. Thanks again for the post!

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Hi Josh, The technology you're refering to is SDPS (Stun During Page Send). It's only used when the page dirty rate of the VM's is faster then the ability to copy the pages to the destination host. SDPS only kicks in when a situation occurs that would mean a vMotion might otherwise fail. It's a failsafe mechanism. So if you have sufficient bandwidth and you're not migrating workloads with a page dirty rate that would exceed the 10G/s bandwidth then you shouldn't be seeing any issues with SDPS. You may want to consider allowing more bandwidth for vMotion depending on the impact to particular applications you're seeing, or reducing the concurrency of vMotion operations by modifying some advanced settings. Any changes should be tested.

      However if you are using QoS or vNIC's in a UCS platform then you may have less vMotion bandwidth then necessary and then in some cases SDPS may kick in. But it is only slowing down the CPU of a VM for a few milliseconds at a time. This should not be sufficient to notice any significant response time impact. If you have very latency sensitive applications then they may require special attention and tuning.

      Generally things just work, but for business critical applications and latency sensitive applications they do require a different approach and more care and attention. Let me know what you find when you dig deeper and if necessary get VMware involved. I'd be keen to see what you come up with and exactly what the situation is.

      Reply
  2. Clearing up a misunderstanding around CPU throttling with vMotion

    […] was reading a nice article by Michael Webster on multi-nic vMotion. In the comment section Josh Attwell refers to a tweet by Eric Siebert around how CPUs are […]

  3. Paul Kelly
    Paul Kelly at |

    Great article. Just wondering though; with this configuration I'm not sure how to configure this to work with a vDS using LBT, NIOC and SIOC.

    Do you need a VMkernel port for each chunk of 10GbE you want to allow for vMotion?

    If you wanted to guarantee 50% of the 4 uplinks to allow roughly 20Gbps vMotion traffic would you configure the environment somthing like this;

    vDS1 – LBT, SIOC, NIOC

    Management – 5 NIOC Shares, active uplinks 1,2,3,4

    VM Networking – 20 NIOC Shares, active uplinks 1,2,3,4

    iSCSI – 20 NIOC Shares, active uplinks 1,2,3,4

    FT – 15 NIOC Shares, active uplinks 1,2,3,4

    vMotion1 – 30 NIOC Shares, active uplinks 1,2,3,4

    vMotion2 – 30 NIOC Shares, active uplinks 1,2,3,4

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Hi Paul,

      vMotion1 – Active Uplink 1, All others Standby or Unused
      vMotion2 – Active Uplink 2, All others standby or Unused

      All other port groups, except management, all uplinks active, using route based on physical NIC load. Best practice for management is active/passive with failback set to no, so that you've always got it on the same side of the switch fabric, and it doesn't flip flop around. This reduces the chances of false isolation events a bit more.

      No need to configure LBT on the vMotion port groups, the other VM's will move around them. NIOC can still be used to control the quality of service. I generally just use the normal, low, high shares without specifying specific values. It's all relative. As long as the important traffic types get the bulk of the shares for when there is congestion. LBT and NIOC sort out most problems.

      Hope this helps.

      Reply
      1. Paul Kelly
        Paul Kelly at |

        Thanks. That all makes sense. I really do envy your ability to test this stuff in that epic lab of yours.

  4. Justin McDearis
    Justin McDearis at |

    Good question Paul, and great answer Michael! Thank you for the very helpful information. I have been looking for an answer to this exact question for our VMware environment.

    Reply
  5. Jason Boche
    Jason Boche at |

    Great article Mr. W. Keep up the good work & fantastic lab.

    Reply
  6. Cisco Lab Setup | Cisco Skills
    Cisco Lab Setup | Cisco Skills at |

    […] The Good, The Great, and the Gotcha with Multi-NIC vMotion in vSphere 5 (longwhiteclouds.com) […]

  7. jaredbowden
    jaredbowden at |

    We have already gone to great lengths to setup our environment using the methods described in KB1004048 using NIC teaming. Aside from the additional configuration, is there any negatives to using this method versus the method you described above? Thanks!

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Yes, you will in most cases not actually be able to use the aggregate bandwidth of multiple uplinks as any individual stream will be limited to the bandwidth of a single uplink. With multi-nic vMotion you can effectively use both NIC's all the time. You are also now restricted to only using IP Hash as your load balancing algorithm, and all port groups on the vDS must be set to use it. For this reason multi-nic vMotion and IP Hash load balancing a mutually exclusive. This won't be a problem if a single link is sufficient for vMotion traffic. You will need to ensure you use Network IO Control to prevent any one traffic type flooding out the others. You can read about the problems with EtherChannel and LACP vs load balancing based on physical NIC load here – http://longwhiteclouds.com/2012/04/10/etherchanne

      Reply
  8. vSphere 5.1 Generally Available – Important Upgrade Considerations « Long White Virtual Clouds

    […] Unicast Flooding with Multi-NIC vMotion is targeted to be fixed in vSphere 5.1 U1 (and 5.0 U2) – see The Good, The Great, and the Gotcha with Multi-NIC vMotion in vSphere 5 […]

  9. VMguru
    VMguru at |

    I'm wondering what the difference will be with a link failure between standby or Unused. If I have it set to standby and a link failure occurs during a vMotion, the vmkernel will get reassigned and the vMotion should complete successfully.

    But what about if it's set to unused? What will happen to the vMotion that is using that path during a link failure, will it timeout and fail? Or will vMotion stop trying to use that VMK?

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      If on your vmk vMotion port the other uplinks are set to unused and the active uplink fails the vmk port will loose network access. But the remaining vmkernel ports configured for vMotion will continue the vMotion operations and provided there is still one surviving vMotion vmk port the vMotion operations will complete successfully. You can set the other uplinks to standby instead of unused. But given the redundancy at the vMotion vmk level there isn't necessarily a requirement for that. I think Multi-NIC vMotion will be far more heavily used once the unicast port flooding situation is addressed.

      Reply
  10. Workaround for Multi-NIC vMotion Unicast Flooding « Long White Virtual Clouds

    […] my previous article “The Good, The Great and the Gotcha with Multi-NIC vMotion in vSphere 5” I discussed an issue that could cause unicast port flooding. One of my large financial […]

  11. nikolab888
    nikolab888 at |

    just great man! you might brought "The Gotcha" part as it's own article, it worth it. Great explanation of what is going on. thanx!

    Reply
  12. nikolab888
    nikolab888 at |

    guys, just be very carefull not to do Etherchanell on two physically separate chassis (switches), unless they for sure support and provide multichassis-etherchanell.

    Reply
  13. Greg
    Greg at |

    What 10 Gb NICs do you have in your home lab, and where did you get them?
    🙂

    Reply
  14. Sam
    Sam at |

    Hi Michael

    Great Post! Though I’ve a question,

    Say I keep my vMotion switch with two uplinks with both as active/active adapters(10Gb uplink each).

    Now I’m putting a host in maintenance mode & I have say 50 VM’s for vMotion to other host, wouldn’t both the uplinks will still be used?

    Maybe uplink1 for vmotion of first 4 VM’s and other uplink2 for vmotion of other 4VM’s at a time eventually vMotioning all the 50VM’s using both uplinks?

    If that’s the case then I’m still utilizing both the uplinks for vMotion even when they are in active/active mode, then how setting up them as Multi-NIC vMotion helping me in multi VM’s scenario?

    If you could explain me on this it’ll be appreciated.

    Thanks in advance.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.