Etherchannel and IP Hash or Load Based Teaming?

Posted by Michael Webster on April 10, 2012 in Business Critical Applications, VMware | 32,446 Views | 52 Responses

Etherchannel or Load Based Teaming has been a popular topic of conversation ever since Load Based Teaming was introduced in vSphere 4.1. Generally the consideration for Etherchannel starts because people are not aware that Load Based Teaming exists as an option, they are not familiar with how virtual networking in vSphere works, or they’ve just always used it. It is quite common for non VMware Admins to think the virtual networking in vSphere acts just like a normal server in which one uplink is active and the others are strictly for failover with no load balancing capability. This is not the case with vSphere and of the five available teaming options only one provides failover only without any load balancing, the four other options all provide load balancing of multiple host uplinks. If you want to know if you should use Etherchannel or Load Based Teaming, and why, keep reading.

vSphere Network Teaming Options

This article assumes vSphere 4.1 or above, but even in previous versions Etherchannel may not be a good choice. The first thing you should know about vSphere Networking is that the out of the box vNetwork Distributed Switch (vDS) support 5 different teaming options, but does not support LACP (802.3ad) or Dynamic Etherchannel. Static Etherchannel is the only form of Etherchannel currently supported (and static link aggregation 802.3ad). You can utilize LACP only if you deploy the Cisco Nexus 1000v or another add on vDS to you environment. This article will not discuss LACP in any detail for this reason.

The five teaming options are:

Route based on originating virtual port
Route based on IP Hash (only one supported with Static Etherchannel and Static 802.3ad)
Route based on Source MAC address
Route based on physical NIC load (Load Based Teaming or LBT)
Use explicit failover order (Not a load balancing algorithm)

All of the choices except “Use explicit failover order” offers uplink load balancing for the vSphere hosts. So you have four options if you are primarily concerned with load balancing the vSphere host uplinks. This however is not the same as a single virtual machine with a single IP address load balancing multiple uplinks and in most cases even this has very little real benefit. I won’t explain all of the various options here as they are covered in the VMware Product Documentation and the purpose of this article is to discuss Etherchannel and Load Based Teaming.

Etherchannel and IP Hash Load Balancing

IP Hash Load Balancing, which requires Static Etherchannel or static 802.3ad be configured on the switching infrastructure, uses a hashing algorithm based on source and destination IP address to determine which host uplink egress traffic should be routed through. VMware’s support and configuration of Etherchannel is explained in VMware KB 1004048. Frank Denneman explains the mechanics of how this works in his article IP-Hash versus LBT, and Ken Cline wrote a good explanation in his article The Great vSwitch Debate – Part 3.

It is possible for some workloads to load balance multiple host uplinks, but in reality the use cases for this are few. To be able to load balance multiple links a workload would have to be sending large amounts of traffic to many destinations (each unlikely to be the same pattern). Each traffic flow will only ever go over a single host uplink, and therefore an individual flow is limited to a single host uplink. The load balancing is calculated on egress traffic only and is not utilization aware.

Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations such as:

Switching infrastructure mush support static Etherchannel or static 802.3ad link aggregation and this must be configured for each hosts uplinks.
To enable network switch redundancy the network switches must support stacking or functionality similar to Virtual Port Channel.
Can’t use beacon probing.
Can’t configure standby or unused uplinks.
Only support one Etherchannel per vNetwork Distributed Switch (vDS).
vSwitch can be configured with between 1 and 8 uplinks.
To get effective load balancing requires many source/destination combinations.

Configuring Etherchannel and IP Hash Load balancing is a very technically complex process that can be error prone if the correct process is not followed. It is easy to knock a hosts management interfaces off the network during configuration (Refer to VMware KB 1022751). It is very hard to achieve an even balance and very easy that one or more uplink become congested while others are more lightly loaded due to the nature of the IP hashing. In many cases there may be no performance gains, even through there are additional overheads to calculate the IP Hashes for every conversation.

The best use case I can think of for IP Hash Load Balancing is for a download server or very high traffic single web server where no other load balancers are involved and it is not possible to deploy multiple VM’s and load balancers for the purpose (which presents a single point of failure). In this cases you may achieve good load balance and utilization of multiple links, even if this load balancing is not dynamic, and there is little control of congestion. But is the additional technical complexity for such a small use case really worth it? Do you truly need to be able to achieve more throughput from a single VM than a single uplink can sustain? In an environment with many VM’s consolidated onto the host do you want a single VM to be able to monopolize host networking to the detriment of others?

If your only reason for wanting to use Etherchannel and IP Hash Load Balancing is to distribute load over multiple host uplinks and provide redundancy and failover then it is most likely not the best choice. In fact if this is your only objective any other of the teaming methods would be a better choice (excluding explicit failover order). The complexity and limitations, when in most cases there will be no performance gain, doesn’t seem to make it worthwhile. This brings us nicely onto Load Based Teaming.

Load Based Teaming

Load Based Teaming, or Route based on Physical NIC Load is an option on the vDS that has been available since vSphere 4.1. It is a more dynamic load teaming algorithm that evaluates uplink utilization every 30 seconds and if utilization exceeds 75% will move VM’s between host uplink ports. LBT will work on standard access or trunk port, and does not require any special switch configuration, such as stacking or Virtual Port Channels. Each VM will be limited to the bandwidth of a single host uplink. LBT works on both ingress and egress traffic flows. It is incredibly simple to set up and use. Frank Denneman has wrote about LBT when it was first released and then followed up with IP-Hash versus LBT as previously mentioned.

The advantages of LBT are:

It’s simplicity to set up and use on both the hosts and the network switches.
Greatly reduced configuration.
Significantly easier troubleshooting.
The dynamic utilization aware nature of the load balancing.
Works on both egress and ingress traffic.
Lower overheads than IP Hash Load balancing.
Reduce the chances of Network IO Control (NIOC) having to take action to control traffic congestion, whereas NIOC may have to play a more active part when IP Hash is being used.
None of the limitations of IP Hash Load Balancing and Etherchannel.

The only downside is a single VM vNIC is limited to the bandwidth of a single host uplink. For a VM to effectively utilize multiple host uplinks it would need to be multi-homed by configuring it with multiple vNIC’s. This is a very small downside when the benefits are so great for the majority of workloads and situations and the sheer simplicity of the configuration and use.

What about LACP?

If you have the Nexus 1000v vDS in your environment (or vSphere 5.1 vDS) and you have switching infrastructure capable of supporting Virtual Port Channels then you may benefit from using LACP. LACP with the Nexus 1000v has 19 different hashing algorithms (vSphere 5.1 vDS has only one algorithm). LACP still suffers from the technical complexity as Etherchannel and some of the same limitations, such as not being able to span switches without special configuration and support. However if you are using Nexus 1000v you have chosen a somewhat more complex configuration in addition to the other features and benefits provided. The additional load balancing methods offer a much greater chance to gain even load balance from many more situations than with static Etherchannel, even though a single conversation is still limited to a single host uplink. If you have the Nexus 1000v and infrastructure capable of supporting LACP across multiple switches so the switch is not a single point of failure then this would be a superior choice than using mac pinning.

Final Word

Use Load Based Teaming unless you have no other option, and even then you should seriously consider not using Etherchannel and IP Hash Load Balancing. The complexity, increased overheads and lack of probable real world benefits make IP Hash a poor choice for most use cases. Remember LACP is not currently supported on the native VMware vDS and I think even if VMware decided to support LACP in future the case for LBT in preference to LACP would still be strong. I would be interested to hear your thoughts on this topic.

—

This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.

94752 Responses2012-04-09+15%3A03%3A54Michael+Webster

Paul Kelly April 10, 2012 at 8:25 pm | Permalink

Nice article. I've been thinking about this topic quite a bit lately, but you wrote about the concept better than I could. Almost all network engineers I come across want to use Etherchannel or LACP by default and it can be quite a task helping them to understand the issues around that design decision.

Reply
OddAngry April 11, 2012 at 12:09 pm | Permalink

Is etherchannel that complex? For me, it's been the network guy doing the work and grouping them while setting them up as trunk ports.

We've only had 1 problem when the network guy missed one of the ports,

If vDS is already in use it makes sense, but In an environment still using vSS with etherchannel, is it worth implementing vDS for LBT? (besides other advantages with vDS).

Reply
Chris April 11, 2012 at 1:37 pm | Permalink

While I'm not sure I would classify a static Etherchannel as technically complex, I do agree that its uses with VMware vSphere are rather limited. It is a shame that you have to purchase Ent+ to experience LB teaming.

Reply
1. @vcdxnz001 April 11, 2012 at 8:53 pm | Permalink
  
  The additional complexity comes in when you have to configure VPC or equivalent across two switches and the additional configuration settings that are required and an additional layer to troubleshoot. But if you're already running Etherchannel with vSS and then moving to vDS you would be better off sticking with what you've got as introducing more change will add risk. But there is still room for error and knocking your hosts of the network, which you have to be careful of. If you're building a new environment or starting fresh LBT is a much simpler option and far easier to implement.
  
  Reply
Cwjking April 12, 2012 at 9:56 pm | Permalink

Good write up. I really enjoyed this article. I am reviewing FCOE on our environments and using this type of configuration on 10GB infratstructure makes a lot of sense. 1 uplink to a host? If that host has nothing but 10GB then open'er up and let it rip B)

Reply
1. @vcdxnz001 April 13, 2012 at 1:36 am | Permalink
  
  I have a customer considering FCoE in a UCS environment, they have Nexus 1000v vDS and will be using LACP on their 2 x 10G Uplinks with VPC's configured on the back end Nexus 5K switches. This is a valid and useful use case as LACP with source/dest ip and port will balance the traffic well and LBT isn't an option in this environment due to the Nexus 1000v anyway. The good thing is they can run LACP on the vmnic's and it won't impact the FCoE to the blades, which are still physically pinned to fabric A and B respectively. As far as the hypervisor is concerned though it's all just FC.
  
  Reply
  1. Cwjking April 13, 2012 at 2:17 am | Permalink
    
    Yea, there are so many consideration to be made with FCoE and alot of them are really valid. I still see a lot of "silos" in todays IT departments were people don't want to play in each others sand box. Cisco UCS is great for referenced architectures. We use a similar set up with Cisco UCS Manager but I really believe its under utilized. I just never even considered LBT before and this was just a fresh view on that. I will have to write some of my thoughts on my blog when I get some time.
    One of the things I find a bit challenging is whenever networking guys (or others) seem to think Cisco UCS is just like any other Nexus/Rack server solution… The misconceptions I sometimes deal with or huge but I try to take it in steps at a time… Sorry for the long reply, this is just a hot topic for me now. Thanks for the follow-up!
NFS on vSphere – A Few Misconceptions « Wahl Network April 20, 2012 at 9:23 am | Permalink

[…] of virtual machine networking, I suggest heading over to this excellent post entitled “Etherchannel and IP Hash or Load Based Teaming?” written by Michael […]
Cwjking April 27, 2012 at 9:35 pm | Permalink

Yeah, I prefer to keep it simple in most implementations but we all know design is the key and use case. We thought about using LAC-P but really the expertise isn't there for our side or the resource for that matter. Sometimes keeping it KISS can mean a lot. Personally though I like it when networking can keep up with that stuff 🙂

Reply
Laurent Metzger May 25, 2012 at 9:38 am | Permalink

My opinion is not that radical as the one exposed in this article. Load-based load balancing will well use all the link in the ESX outgoing direction but the switch will stil have only one link for entering the ESX for a given MAC address so I would not say that this limitation is small.

Other comment: It was already tried in the networking protocol area to base the traffic distribution on load. It is the EIGRP routing protocol. This protocol was not that successfull because what sounds a good idea turns out to be a bad idea. Traffic was always going to the least loaded path which became suddenly the most loaded path and back and forth. This lead to traffic constantly changing path.

Reply
1. @vcdxnz001 May 25, 2012 at 8:55 pm | Permalink
  
  Hi Laurent, the limitation you mention is exactly the same for IP Hash load balancing and Ether Channel. That is egress only. At least with Load Based Teaming after the monitoring period (30s) both inbound and outbound (egress and ingress) traffic will have the effect of being balanced across the team. This is a significant advantage over Ether Channel. One of the reasons EIGRP never took off was because it was proprietary and didn't handle unequal paths very well. But we are not talking about only layer 3 load balancing, but layer 2 also. It is the way that Load Based Teaming has been designed that limits the possibility of flapping, which was a traditional problem of EIGRP. LBT is by far the easiest, least complex, and most effective way of load balancing a NIC team from an ESXi host, provided you have Enterprise Plus licensing.
  
  Reply
Josh Odgers (VCDX#90 June 11, 2012 at 12:15 am | Permalink

Nice post Michael. I agree, LBT is a simple and effective load balancing option which suits most environments.

Reply
Cwjking June 11, 2012 at 4:17 pm | Permalink

I had an architect just the other talk about how he would like to use LAC-P your right in that it still has the same problem. He asked me some questions about the ESX teaming method and how it works (LBT). I essentially told him that unless we are willing to go back and configure this on ALL host that it would not be ideal. That changed his mind because for him it wouldn't really be worth the head ache. I really like this solution especially when working with 10GB FCOE nfrastructures.

Reply
Jack Scagnetti June 27, 2012 at 9:38 pm | Permalink

LBT is great but is not technically feasible in some cases. A great example is vCloud Director. When using a LBT backed portgroup with a routed network in vCD, it will cause a lot of network anomalies such as dropped packets or even complete loss of networking.

Reply
1. @vcdxnz001 June 27, 2012 at 11:22 pm | Permalink
  
  Hi Jack, That sounds like a configuration error to me. LBT is not supported with VCD-NI, but will work fine with Port Group Backed Network Pools, it'll also work fine on the external networks in vCD and the other networks defined on the vSwitch. But IP-Hash doesn't work so well with VCD-NI either, so your options with that are pretty much active/standby only anyway. Even route based on virtual port ID is superior in many cases to using IP hash, especially in a VCD environment.
  
  Reply
VMguru August 18, 2012 at 4:01 pm | Permalink

Hi,

Is it possible to use both? Assume you have both storage vmkernel and vm traffic port groups on two 10GB nics. Can you set up an etherchannel for these two nics, IP hash on the vDs and VMkernel port group, and then override the port group teaming policy for the VM traffic port groups to use LBT instead of IP Hash? What would be the implications of doing this?

Reply
1. @vcdxnz001 August 18, 2012 at 8:03 pm | Permalink
  
  When using Etherchannel, IP Hash is the only supported teaming policy for all port groups connected to the vSwitch. So you can't mix and match. it's one or the other.
  
  Reply
VME August 19, 2012 at 1:54 pm | Permalink

Hi Mike,

So will there be an issue having the VM network port group use IP hash on the vSwitch? Besides is it best to use LBT on a vmkernel port group used for NFS traffic on vDs?

Reply
1. @vcdxnz001 August 20, 2012 at 12:48 am | Permalink
  
  The issue will the configuration is more restrictive as all port groups must use IP Hash on the vSwitch. It is not dynamic and doesn't take account of ingress traffic. You can't mix and match. The physical network configuration is also more complex. But provided you have followed the correct configuration at all points and all port groups are set to IP Hash it should work fine. Just be mindful of the restrictions and limitations. It's more error prone than other methods and you will more often get better overall balance and throughput with LBT.
  
  Reply
will August 29, 2012 at 3:46 pm | Permalink

LACP is availble on VMWare's VDS since 5.1. Always use LACP to prevent data center outages.

Reply
1. @vcdxnz001 August 29, 2012 at 4:06 pm | Permalink
  
  LACP does not prevent or reduce the probability of data center outages and due to its complexity could actually increase the probability. LACP has a number of restrictions that make it not appropriate in lots of cases. All of my arguments in the article still apply.
  
  Reply
huh? November 30, 2012 at 7:03 pm | Permalink

Too bad there is no multi-link PPP option for the L2 datacenter.

All src/dest/hash based algo's are crap. Err, legacy.

Reply
VME December 15, 2012 at 8:31 pm | Permalink

Can i mix 2 different NICs with nic teaming like (Broadcom & Intel)? I haven't seen any docs that specify that is not supported.

Reply
1. @vcdxnz001 December 15, 2012 at 8:34 pm | Permalink
  
  Yes, that is fine. Provided they are the same speed. You can't have two NIC's of different speeds in the same team. Also make sure you stick to the vSphere Config Maximums. You will have trouble if you don't.
  
  Reply
Example Architectural Decision – Virtual Switch Load Balancing Policy « CloudXC December 15, 2012 at 10:27 pm | Permalink

[…] Webster – Etherchanneling or Load based teaming? Frank Denneman – IP Hash verses […]
Rickard Nobel December 30, 2012 at 4:26 pm | Permalink

Nice article and write up on the NIC teaming policies. I agree that LBT is often the best and simplest to setup. The default Port ID second best if not having the Enterprise + licence for vDS and LBT.

Reply
Wally January 9, 2013 at 6:24 am | Permalink

Great article! I think all of these points are being considered in VMware's development. At least I hope so. There seems to be more possibilities with a more evolved version.

I'm running 4 nics per ESXi server with 2 port ether-channel going to two separate switches(non-stacked) and one pair in standby that are on a different pnics. I seem to be balancing fine according to the usage reports. The only scenario that bothers me is if one link goes down and a standby takes its place that goes to a different switch. I would get mac flap. I wish vmware would put some intelligence there to make both standby active. Does anyone know a way to accomplish this? I have monitoring in place and would have to manually intervene in the scenario I described. I am considering stacking so i do not have to worry about it and can make all 4 nics active. Anyone else running like this successfully?

I wish vmware would put out a comprehensive design guide. It seems like the community is the place to go for design questions. which is great too. I see lots of folks going through pain to get there.

Keep up the great info!

Reply
1. @vcdxnz001 January 9, 2013 at 3:01 pm | Permalink
  
  Hi Wally,
  
  In your scenario the best option is to either stack or not use Etherchannel at all. In fact your scenario isn't a supported configuration. I'd suggest it's probably very unlikely to be taken into account during any development plans as a result of this. Many of the other load balancing options however would be supported. Depending of course on your licensing level. If your switches supported link state tracking there might be a way to automatically shutdown one port if another port goes offline, but stacking, or not using Etherchannel would still be far simpler. IP Hash load balancing is still only egress not ingress, so you have pretty much as good a chance of getting load balance by using route based on virtual port id in a lot of cases and this has the advantage of being supported in your type of setup.
  
  Reply
Eric February 26, 2013 at 12:34 pm | Permalink

Nice write up! I'm a network engineer cross-training a bit and seeking to earn a VCP5. I came across your article doing web-based research on the subject. Being that I come from the network side of the house the specifics of virtual to physical network integration are of particular interest to me. The documentation I have doesn't doesn't delve deep enough into this subject so I was pleased ot find your thorough post. You do seem a little bias against IP Hash/Etherchanneling. In the span of several sentences described as

"Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations such as:"

and a bit later

"Configuring Etherchannel and IP Hash Load balancing is a very technically complex process that can be error prone if the correct process is not followed."

That's laying it on a little thick! Any network admin/engineer worth his salt can configure a port-channel in his sleep.

You do make some very worthwhile points however and make a compelling argument for Load Based Teaming. LBT is a solid and in fact one of the best NIC Teaming options. It seems to me that NIC Teaming options listed in order of merit would have to be listed separately for vSS Non Enterprise Plus environments and for an Enterprise Plus licensed vDS environment. I'd rate them as so (Higher is more preferred)

vSS

#1.) Route based on originating virtual port ID or Route based on source MAC Hash

(basically the same effect/level of complexity)

#2.) Route based on IP Hash WITH port-channeling. (must be configured w/ static Port-

channeling)

(more sophisticated than your other two options)

vDS (requires Enterprise Plus Licensing)

#1.) Route based on originating virtual port ID or Route based on source MAC Hash.

(basically the same effect/level of complexity)

#2.) Route based on IP Hash WITH port-channeling. (must be configured w/ static Port-

channeling)

#3.) *****Route based on physical NIC Load/Load Based Teaming*****

#4.) Using NEXUS 1000v third-party vDS and LACP.

(LACP with the Nexus 1000v has 19 different hashing algorithms (vSphere 5.1 vDS has

only one algorithm). The MOST options to select an option MOST suited to a particular

environment.

Reply
1. @vcdxnz001 February 26, 2013 at 4:21 pm | Permalink
  
  I think the underlying assumption there is a Cisco networking environment or an environment that supports vPC or switch clustering. This assumption may not be true in many environments (older 6509 or similar). Also my assumption is that any customers running a serious vSphere environment where business critical apps are being deployed will have Enterprise Plus licensing. I know this might not be true in all cases, but is true for all of the enterprise, service provider / government etc customers I interact with across the globe. The simple fact that there is more configuration to do on the switch side, that there are more failure scenarios that can cause downtime, such as link coming back into service, VPC maintenance, potential risks if LACP on N1KV isn't configured properly, that it is an inferior solution compared to the ease of applying teaming based on network load, and in the absence of that, using virtual port ID, which IMHO still beats Etherchannel for most environments. It's simply a matter of weighing up the configuration pros and cons and justifying why there is a requirement or need for Etherchannel and in almost all cases there simply isn't any driver or requirement for it or any benefit for it. So it's not a matter of can it be done, or a skilled network admin could do it easily, it's more why would you do it and how does it map back to the requirements.
  
  Reply
David March 23, 2013 at 11:40 pm | Permalink

Michael,

Nice article. Always enjoy reading your deep dives. Ran into an interesting design problem involving LBT. Should auto failback be enabled with LBT? I believe it should be set to no. By keeping failback off, you avoid potential network flapping while LBT will load balance across the nics during contention. Would love to hear your opinion on this.

Thanks,

Reply
1. @vcdxnz001 March 23, 2013 at 11:44 pm | Permalink
  
  Hi David, I agree with you. Especially when it comes to the management and vmkernel networks failback should be set to know. This is a setting that many architects and admins miss in their designs. There are multiple reasons why you'd want to set it to know including the cases of transient network events and maintenance where a switch may have link status but might not fully be back in service yet and therefore traffic can avoid being black holed. Beacon probing still has it's place when there are multiple and an odd number of NIC's in a team. But in general when using link status detection failback set to no can be a good idea.
  
  Reply
Keytoolz March 26, 2013 at 4:07 am | Permalink

Great article, much appreciated. I have a redundancy question about running LBT across separate switches.

I am running 2 hosts and creating a VDS for storage as an example (I’m creating separate VDS’s for management and virtual machines in a similar fashion as this storage VDS) . I have 2 1GB pNICs dedicated to this on each host. I have 2 physical cisco switches. Each host has its’ 2 pNICs connected to separate physical switches. The ports on the physical switches are all within the same vlan (the storage vlan of course). I am then creating an uplink group for the storage VDS and adding all four of the pNICs and enabling LBT. Can you tell me if this design is supported? Also can you tell me if I would place all 4 pNICS as active within the team?

Thanks for advice.

Reply
1. @vcdxnz001 March 26, 2013 at 9:09 pm | Permalink
  
  That configuration most certainly is supported and in fact that is why Route Based on Physical NIC Load (Load Based Teaming) is so powerful. It's just so simple. You plug in the NIC's to the switches, set them all to active, configure the teaming method and away you go. Of course each uplink needs to be in all the same VLAN's. But it's just that easy. No complication compared to trying to use EtherChannel or LACP. Good luck with your configuration, fully supported, easy to implement, just works.
  
  Reply
Keytoolz March 30, 2013 at 1:29 am | Permalink

Excellent. I wasn't sure about using the separate switches but I'm glad to hear I'm doing it right. much appreciated.

Reply
VoiceDr. May 9, 2013 at 1:57 am | Permalink

This is a great write up. I personally believe in simple and it often provides the best survivability or recoverability simply in its simplicity.

That said everything hear speaks to load balancing issues and not recovery time. I am deploying multimedia applications and call center applications in the VMware. The fault recovery tolerances of Voice and Video are much stricter than with typical server applications. I have tested voice and video media in the real world and my test show IP Hash to provide quicker recovery than any of the other possible configurations. We have 2 x 10GB with SMLT (MEC) in Avaya switches. I have not had the opportunity to test 5.1 with Dynamic LACP for recovery yet.

I would like to hear your input and experiences with real-time media and failover scenarios using vDS.

Reply
1. @vcdxnz001 May 9, 2013 at 6:32 am | Permalink
  
  Hi, Link State Detection and Failover speed should be no different between static Etherchannel, LACP, or other failover methods all things being equal. It will depend however on how the physical switches are configured and what they support. I've run Voice, Video and other latency sensitive workloads on top of Teaming based on Physical NIC Load without any problems in 2 x 10G NIC hosts. What failure detection mechanism were you using? How are the physical switches configured and what make/model are they?
  
  Reply
VoiceDr. May 10, 2013 at 1:11 am | Permalink

The switches for 10GB are Avaya VSP 7024. We have another cluster with Avaya ERS5650s and yet another with Cisco 6509s. For the Avaya clusters we used following link as a guideline. https://downloads.avaya.com/css/P8/documents/1001… The 6500 switches are not clustered and do not support MEC so they have simple trunk configurations as you suggest. All are using basic link state detection. The 10 gb connected servers only have 2 NICs and from the best practice documents Beacon probing is not recommended without a third.

Reply
1. @vcdxnz001 May 12, 2013 at 12:23 am | Permalink
  
  Beacon probing becomes indeterministic with only 2 NIC's, as it can't determine which NIC has failed. As a result it will send all outbound traffic down both uplinks. This isn't necessary a problem as one of the uplinks is down. With the 6500's you need to ensure that your uplink ports are configured for portfast trunk edge. This will ensure any port state transitions are completed as fast as possible. Sometimes it can take a while for the ports to pass traffic when they come back into service. Detecting a link state (layer 1) failure should be just as fast provided portfast is enabled and you should not lose any significant packets between the different load balancing methods. Assuming you're using something other than active/standby configuration only a portion of the traffic should have to fail over during a link state change. How were you measuring the impact of link state changes and what was the cause of the link state changes? I have customers running CCTV systems on top of vSphere for large numbers of cameras and not noticing link state changes in their environments.
  
  Reply
  1. VoiceDr. May 13, 2013 at 5:31 pm | Permalink
    
    The configuration is TOR with 4 x 1GB full OSPF equal cost interface back to the core. This OSPF interface as well at the server interface have been flawless. The server side is currently static LACP with IP hash x 2 10GB links. Our applications do not really need all of this horse power on the network but it really simplifies installation and allows for much quicker Vmotion etc. We have multiple UC applications running in the environment which consist of three separate single cabinet clusters (similar to Avaya C-POD) separated geographically. The UC applications include Polycom Video conferencing, MS Lync, MS UMS and most of Avaya Aura applications. I am old school and believe the only test that matter simulate real world so test will consist of generating several calls in each application and then doing thinks like pulling out the power cords or flashing to a different software load, unplugging a DAC on the data switches. Or just pulling the power on one of the servers. The environment has performed respectfully in our current configuration with very few calls dropped. I am always looking for a better way so I have been paying attention to many blogs such as yours with interest in finding the so called ultimate configuration. I like your way of thinking on simplicity and would prefer to have to not depend on MEC and LACP. Not that they are hard but that it is just one more thing to introduce opportunity for human failure. My test so far with BDPU guard and portfast etc enabled as recommended we can lose calls during failover with a configuration depending on the server to detect failure without the IP hash LACP. Not all calls drop but more than with the IP hash and enough so that if it happen in the middle of the day with hundreds of calls up it would be a problem $$$. Our datacenter is large (we are a SaaS model) but my teams UC portion of the datacenter is small. So the slight administrative overhead associated with manually configuring virtual networks vs vDS for up to 30 servers is not a problem for us versus our datacenter with 100s of servers. But again 30 servers offers 30 times the opportunity for human error  . vDS has had some documented issues of servers not being able to find the Vcenter with vDS. This is said to have been repaired in the latest patches. With 10GB bandwidth at the server load balancing for the sake of bandwidth is not my top concern simply best performance of failover.
  2. @vcdxnz001 May 16, 2013 at 12:44 pm | Permalink
    
    Hi, I was one of the people who found the defects with the vDS and helped get them fixed. I'm confident it now works the way it was supposed to (provided you are on the patched version of vSphere/vCenter). I agree with your testing approach but I'm not sure why you're seeing the results you are with regards to dropped calls when link state changes on a non-EtherChannel IP-Hash link. What type of servers are you using and what make / model of network cards? The failover should be seamless to the applications.
@AB May 24, 2013 at 4:22 pm | Permalink

Hi there, I am in the process of designing a vCloud implementation (small scale) and wondered what the best way to configure the NIC's and load balancing algorithm on the ESX resource cluster would be. I have 8 NICs in each host and I would like to use 2 as a vSS for management and vmotion. Then I would like to use 6 as a vDS for all other port groups and client networks etc. Should I configure the load balancing algorithm to be route based on originating port ID on the vDS? The servers will be connected to a Brocade 6510 stack (VCS) and I wanted to split some of the connectivity between the switches. I would like to avoid using etherchannels where possible as I agree it does add complexity. Can you advise if I could have 3 NICs from the vDS connected to one Brocade switch and the remaining 3 NICs from that same vDS to the other Brocade switch and use the route based on originating port ID as the load balancing algorithm? Would it then just be a case of ensuring the ports on the switch are configured as trunk ports for the various VLANs?

Reply
1. @vcdxnz001 June 6, 2013 at 8:21 am | Permalink
  
  Hi Angie, Sorry for the late reply it seems I missed this comment. It will depend on which version of vCloud Director and vSphere you plan to use and what type of network pools you plan on using. If you were to say use VXLAN then using a port group with 6 NIC's assigned and port ID or load based teaming isn't going to work, in fact load based teaming isn't supported. With port-id all the virtual networks will go down one interface. In this case you'd want to use Etherchannel or LACP, which then adds complexity as you know. We normally recommend that you put External Networks on one vDS, Network Pools on another vDS. But this is mainly to separate the different types of traffic.
  
  A lot will depend on what requirements there are for your project. If you're using Portgroup Backed or VLAN Backed network pools then your options are different, and different again if using vCD-NI. But if you're switches are stacked together and essentially one logical switch you do have the option of running LACP or Etherchannel across them with 3 physical NIC's on each physical switch in the stack (using Etherchannel or LACP), but it would pay to check with your switch vendor documentation and guidance. I'm not that familiar with Brocade switches. If you were happy just using one uplink of say an active / passive pair you could use 3 x vDS with 2 uplinks each in active / passive config with route based on port ID. Don't forget about the MTU requirements. Hope this helps a little.
  
  Reply
VCAP5 - DCD resources November 27, 2013 at 8:01 am | Permalink

[…] […]
Renmo December 11, 2013 at 6:10 pm | Permalink

Hello Michael,

I'm designing an environment and LBT was chosen to load balance.
my environment has nexus 7k and configured with vPC and my servers are with 2x10Gb uplinks, the question that was raised is, in case a backup job was running and the session was using pNIC1, once it's overloaded and the load balancing algorithm kicks in, the session will use pNIC2, the network guy claimed that the session will be disconnected and the ports connected to pNIC1 in my vPC will drop the packets leading to a failed backup job ?

Reply
1. vcdxnz001 December 11, 2013 at 6:14 pm | Permalink
  
  Hi Renmo,
  
  The session shouldn\’t be disconnected. But there is no need for VPC. When the VM changes NIC a gARP will update the ARP table and there should be no disruption to traffic. LBT removes the need for VPC.
  
  Reply
Renmo December 11, 2013 at 6:42 pm | Permalink

Hello Michael,

It's already configured with vPC, but what i really need is an official document to give it to my client, do you know where i can get it from ? 🙂

Reply
1. vcdxnz001 December 11, 2013 at 7:07 pm | Permalink
  
  You’d have to go back to the VMware documentation. Maybe some KB’s. But there are no official documents for every combination. There are architectural decisions and trade offs to be made for each situation. Then of course you have to verify too.
  
  Reply
Enterprise NAS with vSphere 5.1, EMC VNX VG8 & Symmetrix VMAX 20K | vcdx133.com June 4, 2014 at 5:00 am | Permalink

[…] Long White Virtual Clouds EtherChannel and IP Hash or Load Based Teaming? […]
vSphere LBT and Enterprise Plus Switches - a VCDX Constraint ? June 19, 2014 at 9:25 pm | Permalink

[…] http://longwhiteclouds.com/2012/04/10/etherchannel-and-ip-hash-or-load-based-teaming/ […]
vCoffee Links #7 – Web-scale Wednesday » vHersey - VCDX Two to the Seventh Power (#128) June 25, 2014 at 10:31 pm | Permalink

[…] Nice post from @vcdxnz001 on choosing a network teaming or load balancing option in a vSphere environment: Etherchannel and IP Hash or Load Based Teaming? […]
VMware Distributed vSwitch LACP Configuration with Dell Force10 and Cumulus Linux | Long White Virtual Clouds November 25, 2015 at 7:53 am | Permalink

[…] my article Etherchannel and IP Hash or Load Based Teaming? I argued that using port channels and Etherchannel or LACP is an overly complex configuration that […]

all things Nutanix, VMware, cloud and virtualizing business critical applications

Etherchannel and IP Hash or Load Based Teaming?

Like this:

94752 Responses2012-04-09+15%3A03%3A54Michael+Webster

Leave a ReplyCancel reply

Share this:

Like this:

94752 Responseshttp%3A%2F%2Flongwhiteclouds.com%2F2012%2F04%2F10%2Fetherchannel-and-ip-hash-or-load-based-teaming%2FEtherchannel+and+IP+Hash+or+Load+Based+Teaming%3F+2012-04-09+15%3A03%3A54Michael+Websterhttp%3A%2F%2Flongwhiteclouds.com%2F%3Fp%3D947

Leave a ReplyCancel reply

94752 Responses2012-04-09+15%3A03%3A54Michael+Webster