52 Responses

  1. Paul Kelly
    Paul Kelly at |

    Nice article. I've been thinking about this topic quite a bit lately, but you wrote about the concept better than I could. Almost all network engineers I come across want to use Etherchannel or LACP by default and it can be quite a task helping them to understand the issues around that design decision.

    Reply
  2. OddAngry
    OddAngry at |

    Is etherchannel that complex? For me, it's been the network guy doing the work and grouping them while setting them up as trunk ports.

    We've only had 1 problem when the network guy missed one of the ports,

    If vDS is already in use it makes sense, but In an environment still using vSS with etherchannel, is it worth implementing vDS for LBT? (besides other advantages with vDS).

    Reply
  3. Chris
    Chris at |

    While I'm not sure I would classify a static Etherchannel as technically complex, I do agree that its uses with VMware vSphere are rather limited. It is a shame that you have to purchase Ent+ to experience LB teaming.

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      The additional complexity comes in when you have to configure VPC or equivalent across two switches and the additional configuration settings that are required and an additional layer to troubleshoot. But if you're already running Etherchannel with vSS and then moving to vDS you would be better off sticking with what you've got as introducing more change will add risk. But there is still room for error and knocking your hosts of the network, which you have to be careful of. If you're building a new environment or starting fresh LBT is a much simpler option and far easier to implement.

      Reply
  4. Cwjking
    Cwjking at |

    Good write up. I really enjoyed this article. I am reviewing FCOE on our environments and using this type of configuration on 10GB infratstructure makes a lot of sense. 1 uplink to a host? If that host has nothing but 10GB then open'er up and let it rip B)

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      I have a customer considering FCoE in a UCS environment, they have Nexus 1000v vDS and will be using LACP on their 2 x 10G Uplinks with VPC's configured on the back end Nexus 5K switches. This is a valid and useful use case as LACP with source/dest ip and port will balance the traffic well and LBT isn't an option in this environment due to the Nexus 1000v anyway. The good thing is they can run LACP on the vmnic's and it won't impact the FCoE to the blades, which are still physically pinned to fabric A and B respectively. As far as the hypervisor is concerned though it's all just FC.

      Reply
      1. Cwjking
        Cwjking at |

        Yea, there are so many consideration to be made with FCoE and alot of them are really valid. I still see a lot of "silos" in todays IT departments were people don't want to play in each others sand box. Cisco UCS is great for referenced architectures. We use a similar set up with Cisco UCS Manager but I really believe its under utilized. I just never even considered LBT before and this was just a fresh view on that. I will have to write some of my thoughts on my blog when I get some time.
        One of the things I find a bit challenging is whenever networking guys (or others) seem to think Cisco UCS is just like any other Nexus/Rack server solution… The misconceptions I sometimes deal with or huge but I try to take it in steps at a time… Sorry for the long reply, this is just a hot topic for me now. Thanks for the follow-up!

  5. NFS on vSphere – A Few Misconceptions « Wahl Network

    […] of virtual machine networking, I suggest heading over to this excellent post entitled “Etherchannel and IP Hash or Load Based Teaming?” written by Michael […]

  6. Cwjking
    Cwjking at |

    Yeah, I prefer to keep it simple in most implementations but we all know design is the key and use case. We thought about using LAC-P but really the expertise isn't there for our side or the resource for that matter. Sometimes keeping it KISS can mean a lot. Personally though I like it when networking can keep up with that stuff 🙂

    Reply
  7. Laurent Metzger
    Laurent Metzger at |

    My opinion is not that radical as the one exposed in this article. Load-based load balancing will well use all the link in the ESX outgoing direction but the switch will stil have only one link for entering the ESX for a given MAC address so I would not say that this limitation is small.

    Other comment: It was already tried in the networking protocol area to base the traffic distribution on load. It is the EIGRP routing protocol. This protocol was not that successfull because what sounds a good idea turns out to be a bad idea. Traffic was always going to the least loaded path which became suddenly the most loaded path and back and forth. This lead to traffic constantly changing path.

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Hi Laurent, the limitation you mention is exactly the same for IP Hash load balancing and Ether Channel. That is egress only. At least with Load Based Teaming after the monitoring period (30s) both inbound and outbound (egress and ingress) traffic will have the effect of being balanced across the team. This is a significant advantage over Ether Channel. One of the reasons EIGRP never took off was because it was proprietary and didn't handle unequal paths very well. But we are not talking about only layer 3 load balancing, but layer 2 also. It is the way that Load Based Teaming has been designed that limits the possibility of flapping, which was a traditional problem of EIGRP. LBT is by far the easiest, least complex, and most effective way of load balancing a NIC team from an ESXi host, provided you have Enterprise Plus licensing.

      Reply
  8. Josh Odgers (VCDX#90
    Josh Odgers (VCDX#90 at |

    Nice post Michael. I agree, LBT is a simple and effective load balancing option which suits most environments.

    Reply
  9. Cwjking
    Cwjking at |

    I had an architect just the other talk about how he would like to use LAC-P your right in that it still has the same problem. He asked me some questions about the ESX teaming method and how it works (LBT). I essentially told him that unless we are willing to go back and configure this on ALL host that it would not be ideal. That changed his mind because for him it wouldn't really be worth the head ache. I really like this solution especially when working with 10GB FCOE nfrastructures.

    Reply
  10. Jack Scagnetti
    Jack Scagnetti at |

    LBT is great but is not technically feasible in some cases. A great example is vCloud Director. When using a LBT backed portgroup with a routed network in vCD, it will cause a lot of network anomalies such as dropped packets or even complete loss of networking.

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Hi Jack, That sounds like a configuration error to me. LBT is not supported with VCD-NI, but will work fine with Port Group Backed Network Pools, it'll also work fine on the external networks in vCD and the other networks defined on the vSwitch. But IP-Hash doesn't work so well with VCD-NI either, so your options with that are pretty much active/standby only anyway. Even route based on virtual port ID is superior in many cases to using IP hash, especially in a VCD environment.

      Reply
  11. VMguru
    VMguru at |

    Hi,

    Is it possible to use both? Assume you have both storage vmkernel and vm traffic port groups on two 10GB nics. Can you set up an etherchannel for these two nics, IP hash on the vDs and VMkernel port group, and then override the port group teaming policy for the VM traffic port groups to use LBT instead of IP Hash? What would be the implications of doing this?

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      When using Etherchannel, IP Hash is the only supported teaming policy for all port groups connected to the vSwitch. So you can't mix and match. it's one or the other.

      Reply
  12. VME
    VME at |

    Hi Mike,

    So will there be an issue having the VM network port group use IP hash on the vSwitch? Besides is it best to use LBT on a vmkernel port group used for NFS traffic on vDs?

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      The issue will the configuration is more restrictive as all port groups must use IP Hash on the vSwitch. It is not dynamic and doesn't take account of ingress traffic. You can't mix and match. The physical network configuration is also more complex. But provided you have followed the correct configuration at all points and all port groups are set to IP Hash it should work fine. Just be mindful of the restrictions and limitations. It's more error prone than other methods and you will more often get better overall balance and throughput with LBT.

      Reply
  13. will
    will at |

    LACP is availble on VMWare's VDS since 5.1. Always use LACP to prevent data center outages.

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      LACP does not prevent or reduce the probability of data center outages and due to its complexity could actually increase the probability. LACP has a number of restrictions that make it not appropriate in lots of cases. All of my arguments in the article still apply.

      Reply
  14. huh?
    huh? at |

    Too bad there is no multi-link PPP option for the L2 datacenter.

    All src/dest/hash based algo's are crap. Err, legacy.

    Reply
  15. VME
    VME at |

    Can i mix 2 different NICs with nic teaming like (Broadcom & Intel)? I haven't seen any docs that specify that is not supported.

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Yes, that is fine. Provided they are the same speed. You can't have two NIC's of different speeds in the same team. Also make sure you stick to the vSphere Config Maximums. You will have trouble if you don't.

      Reply
  16. Example Architectural Decision – Virtual Switch Load Balancing Policy « CloudXC

    […] Webster – Etherchanneling or Load based teaming? Frank Denneman – IP Hash verses […]

  17. Rickard Nobel
    Rickard Nobel at |

    Nice article and write up on the NIC teaming policies. I agree that LBT is often the best and simplest to setup. The default Port ID second best if not having the Enterprise + licence for vDS and LBT.

    Reply
  18. Wally
    Wally at |

    Great article! I think all of these points are being considered in VMware's development. At least I hope so. There seems to be more possibilities with a more evolved version.

    I'm running 4 nics per ESXi server with 2 port ether-channel going to two separate switches(non-stacked) and one pair in standby that are on a different pnics. I seem to be balancing fine according to the usage reports. The only scenario that bothers me is if one link goes down and a standby takes its place that goes to a different switch. I would get mac flap. I wish vmware would put some intelligence there to make both standby active. Does anyone know a way to accomplish this? I have monitoring in place and would have to manually intervene in the scenario I described. I am considering stacking so i do not have to worry about it and can make all 4 nics active. Anyone else running like this successfully?

    I wish vmware would put out a comprehensive design guide. It seems like the community is the place to go for design questions. which is great too. I see lots of folks going through pain to get there.

    Keep up the great info!

    Reply
    1. @vcdxnz001
      @vcdxnz001 at |

      Hi Wally,

      In your scenario the best option is to either stack or not use Etherchannel at all. In fact your scenario isn't a supported configuration. I'd suggest it's probably very unlikely to be taken into account during any development plans as a result of this. Many of the other load balancing options however would be supported. Depending of course on your licensing level. If your switches supported link state tracking there might be a way to automatically shutdown one port if another port goes offline, but stacking, or not using Etherchannel would still be far simpler. IP Hash load balancing is still only egress not ingress, so you have pretty much as good a chance of getting load balance by using route based on virtual port id in a lot of cases and this has the advantage of being supported in your type of setup.

      Reply
  19. Eric
    Eric at |

    Nice write up! I'm a network engineer cross-training a bit and seeking to earn a VCP5. I came across your article doing web-based research on the subject. Being that I come from the network side of the house the specifics of virtual to physical network integration are of particular interest to me. The documentation I have doesn't doesn't delve deep enough into this subject so I was pleased ot find your thorough post. You do seem a little bias against IP Hash/Etherchanneling. In the span of several sentences described as

    "Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations such as:"

    and a bit later

    "Configuring Etherchannel and IP Hash Load balancing is a very technically complex process that can be error prone if the correct process is not followed."

    That's laying it on a little thick! Any network admin/engineer worth his salt can configure a port-channel in his sleep.

    You do make some very worthwhile points however and make a compelling argument for Load Based Teaming. LBT is a solid and in fact one of the best NIC Teaming options. It seems to me that NIC Teaming options listed in order of merit would have to be listed separately for vSS Non Enterprise Plus environments and for an Enterprise Plus licensed vDS environment. I'd rate them as so (Higher is more preferred)

    vSS

    #1.) Route based on originating virtual port ID or Route based on source MAC Hash

    (basically the same effect/level of complexity)

    #2.) Route based on IP Hash WITH port-channeling. (must be configured w/ static Port-

    channeling)

    (more sophisticated than your other two options)

    vDS (requires Enterprise Plus Licensing)

    #1.) Route based on originating virtual port ID or Route based on source MAC Hash.

    (basically the same effect/level of complexity)

    #2.) Route based on IP Hash WITH port-channeling. (must be configured w/ static Port-

    channeling)

    #3.) *****Route based on physical NIC Load/Load Based Teaming*****

    #4.) Using NEXUS 1000v third-party vDS and LACP.

    (LACP with the Nexus 1000v has 19 different hashing algorithms (vSphere 5.1 vDS has

    only one algorithm). The MOST options to select an option MOST suited to a particular

    environment.

    Reply
  20. David
    David at |

    Michael,

    Nice article. Always enjoy reading your deep dives. Ran into an interesting design problem involving LBT. Should auto failback be enabled with LBT? I believe it should be set to no. By keeping failback off, you avoid potential network flapping while LBT will load balance across the nics during contention. Would love to hear your opinion on this.

    Thanks,

    Reply
  21. Keytoolz
    Keytoolz at |

    Great article, much appreciated. I have a redundancy question about running LBT across separate switches.

    I am running 2 hosts and creating a VDS for storage as an example (I’m creating separate VDS’s for management and virtual machines in a similar fashion as this storage VDS) . I have 2 1GB pNICs dedicated to this on each host. I have 2 physical cisco switches. Each host has its’ 2 pNICs connected to separate physical switches. The ports on the physical switches are all within the same vlan (the storage vlan of course). I am then creating an uplink group for the storage VDS and adding all four of the pNICs and enabling LBT. Can you tell me if this design is supported? Also can you tell me if I would place all 4 pNICS as active within the team?

    Thanks for advice.

    Reply
  22. Keytoolz
    Keytoolz at |

    Excellent. I wasn't sure about using the separate switches but I'm glad to hear I'm doing it right. much appreciated.

    Reply
  23. VoiceDr.
    VoiceDr. at |

    This is a great write up. I personally believe in simple and it often provides the best survivability or recoverability simply in its simplicity.

    That said everything hear speaks to load balancing issues and not recovery time. I am deploying multimedia applications and call center applications in the VMware. The fault recovery tolerances of Voice and Video are much stricter than with typical server applications. I have tested voice and video media in the real world and my test show IP Hash to provide quicker recovery than any of the other possible configurations. We have 2 x 10GB with SMLT (MEC) in Avaya switches. I have not had the opportunity to test 5.1 with Dynamic LACP for recovery yet.

    I would like to hear your input and experiences with real-time media and failover scenarios using vDS.

    Reply
  24. VoiceDr.
    VoiceDr. at |

    The switches for 10GB are Avaya VSP 7024. We have another cluster with Avaya ERS5650s and yet another with Cisco 6509s. For the Avaya clusters we used following link as a guideline. https://downloads.avaya.com/css/P8/documents/1001… The 6500 switches are not clustered and do not support MEC so they have simple trunk configurations as you suggest. All are using basic link state detection. The 10 gb connected servers only have 2 NICs and from the best practice documents Beacon probing is not recommended without a third.

    Reply
  25. @AB
    @AB at |

    Hi there, I am in the process of designing a vCloud implementation (small scale) and wondered what the best way to configure the NIC's and load balancing algorithm on the ESX resource cluster would be. I have 8 NICs in each host and I would like to use 2 as a vSS for management and vmotion. Then I would like to use 6 as a vDS for all other port groups and client networks etc. Should I configure the load balancing algorithm to be route based on originating port ID on the vDS? The servers will be connected to a Brocade 6510 stack (VCS) and I wanted to split some of the connectivity between the switches. I would like to avoid using etherchannels where possible as I agree it does add complexity. Can you advise if I could have 3 NICs from the vDS connected to one Brocade switch and the remaining 3 NICs from that same vDS to the other Brocade switch and use the route based on originating port ID as the load balancing algorithm? Would it then just be a case of ensuring the ports on the switch are configured as trunk ports for the various VLANs?

    Reply
  26. VCAP5 - DCD resources
    VCAP5 - DCD resources at |

    […] […]

  27. Renmo
    Renmo at |

    Hello Michael,

    I'm designing an environment and LBT was chosen to load balance.
    my environment has nexus 7k and configured with vPC and my servers are with 2x10Gb uplinks, the question that was raised is, in case a backup job was running and the session was using pNIC1, once it's overloaded and the load balancing algorithm kicks in, the session will use pNIC2, the network guy claimed that the session will be disconnected and the ports connected to pNIC1 in my vPC will drop the packets leading to a failed backup job ?

    Reply
  28. Renmo
    Renmo at |

    Hello Michael,

    It's already configured with vPC, but what i really need is an official document to give it to my client, do you know where i can get it from ? 🙂

    Reply
  29. Enterprise NAS with vSphere 5.1, EMC VNX VG8 & Symmetrix VMAX 20K | vcdx133.com

    […] Long White Virtual Clouds EtherChannel and IP Hash or Load Based Teaming? […]

  30. vSphere LBT and Enterprise Plus Switches - a VCDX Constraint ?
  31. vCoffee Links #7 – Web-scale Wednesday » vHersey - VCDX Two to the Seventh Power (#128)

    […] Nice post from @vcdxnz001 on choosing a network teaming or load balancing option in a vSphere environment: Etherchannel and IP Hash or Load Based Teaming? […]

  32. VMware Distributed vSwitch LACP Configuration with Dell Force10 and Cumulus Linux | Long White Virtual Clouds

    […] my article Etherchannel and IP Hash or Load Based Teaming? I argued that using port channels and Etherchannel or LACP is an overly complex configuration that […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.