Etherchannel or Load Based Teaming has been a popular topic of conversation ever since Load Based Teaming was introduced in vSphere 4.1. Generally the consideration for Etherchannel starts because people are not aware that Load Based Teaming exists as an option, they are not familiar with how virtual networking in vSphere works, or they’ve just always used it. It is quite common for non VMware Admins to think the virtual networking in vSphere acts just like a normal server in which one uplink is active and the others are strictly for failover with no load balancing capability. This is not the case with vSphere and of the five available teaming options only one provides failover only without any load balancing, the four other options all provide load balancing of multiple host uplinks. If you want to know if you should use Etherchannel or Load Based Teaming, and why, keep reading.
vSphere Network Teaming Options
This article assumes vSphere 4.1 or above, but even in previous versions Etherchannel may not be a good choice. The first thing you should know about vSphere Networking is that the out of the box vNetwork Distributed Switch (vDS) support 5 different teaming options, but does not support LACP (802.3ad) or Dynamic Etherchannel. Static Etherchannel is the only form of Etherchannel currently supported (and static link aggregation 802.3ad). You can utilize LACP only if you deploy the Cisco Nexus 1000v or another add on vDS to you environment. This article will not discuss LACP in any detail for this reason.
The five teaming options are:
Route based on originating virtual port
Route based on IP Hash (only one supported with Static Etherchannel and Static 802.3ad)
Route based on Source MAC address
Route based on physical NIC load (Load Based Teaming or LBT)
Use explicit failover order (Not a load balancing algorithm)
All of the choices except “Use explicit failover order” offers uplink load balancing for the vSphere hosts. So you have four options if you are primarily concerned with load balancing the vSphere host uplinks. This however is not the same as a single virtual machine with a single IP address load balancing multiple uplinks and in most cases even this has very little real benefit. I won’t explain all of the various options here as they are covered in the VMware Product Documentation and the purpose of this article is to discuss Etherchannel and Load Based Teaming.
Etherchannel and IP Hash Load Balancing
IP Hash Load Balancing, which requires Static Etherchannel or static 802.3ad be configured on the switching infrastructure, uses a hashing algorithm based on source and destination IP address to determine which host uplink egress traffic should be routed through. VMware’s support and configuration of Etherchannel is explained in VMware KB 1004048. Frank Denneman explains the mechanics of how this works in his article IP-Hash versus LBT, and Ken Cline wrote a good explanation in his article The Great vSwitch Debate – Part 3.
It is possible for some workloads to load balance multiple host uplinks, but in reality the use cases for this are few. To be able to load balance multiple links a workload would have to be sending large amounts of traffic to many destinations (each unlikely to be the same pattern). Each traffic flow will only ever go over a single host uplink, and therefore an individual flow is limited to a single host uplink. The load balancing is calculated on egress traffic only and is not utilization aware.
Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations such as:
- Switching infrastructure mush support static Etherchannel or static 802.3ad link aggregation and this must be configured for each hosts uplinks.
- To enable network switch redundancy the network switches must support stacking or functionality similar to Virtual Port Channel.
- Can’t use beacon probing.
- Can’t configure standby or unused uplinks.
- Only support one Etherchannel per vNetwork Distributed Switch (vDS).
- vSwitch can be configured with between 1 and 8 uplinks.
- To get effective load balancing requires many source/destination combinations.
Configuring Etherchannel and IP Hash Load balancing is a very technically complex process that can be error prone if the correct process is not followed. It is easy to knock a hosts management interfaces off the network during configuration (Refer to VMware KB 1022751). It is very hard to achieve an even balance and very easy that one or more uplink become congested while others are more lightly loaded due to the nature of the IP hashing. In many cases there may be no performance gains, even through there are additional overheads to calculate the IP Hashes for every conversation.
The best use case I can think of for IP Hash Load Balancing is for a download server or very high traffic single web server where no other load balancers are involved and it is not possible to deploy multiple VM’s and load balancers for the purpose (which presents a single point of failure). In this cases you may achieve good load balance and utilization of multiple links, even if this load balancing is not dynamic, and there is little control of congestion. But is the additional technical complexity for such a small use case really worth it? Do you truly need to be able to achieve more throughput from a single VM than a single uplink can sustain? In an environment with many VM’s consolidated onto the host do you want a single VM to be able to monopolize host networking to the detriment of others?
If your only reason for wanting to use Etherchannel and IP Hash Load Balancing is to distribute load over multiple host uplinks and provide redundancy and failover then it is most likely not the best choice. In fact if this is your only objective any other of the teaming methods would be a better choice (excluding explicit failover order). The complexity and limitations, when in most cases there will be no performance gain, doesn’t seem to make it worthwhile. This brings us nicely onto Load Based Teaming.
Load Based Teaming
Load Based Teaming, or Route based on Physical NIC Load is an option on the vDS that has been available since vSphere 4.1. It is a more dynamic load teaming algorithm that evaluates uplink utilization every 30 seconds and if utilization exceeds 75% will move VM’s between host uplink ports. LBT will work on standard access or trunk port, and does not require any special switch configuration, such as stacking or Virtual Port Channels. Each VM will be limited to the bandwidth of a single host uplink. LBT works on both ingress and egress traffic flows. It is incredibly simple to set up and use. Frank Denneman has wrote about LBT when it was first released and then followed up with IP-Hash versus LBT as previously mentioned.
The advantages of LBT are:
- It’s simplicity to set up and use on both the hosts and the network switches.
- Greatly reduced configuration.
- Significantly easier troubleshooting.
- The dynamic utilization aware nature of the load balancing.
- Works on both egress and ingress traffic.
- Lower overheads than IP Hash Load balancing.
- Reduce the chances of Network IO Control (NIOC) having to take action to control traffic congestion, whereas NIOC may have to play a more active part when IP Hash is being used.
- None of the limitations of IP Hash Load Balancing and Etherchannel.
The only downside is a single VM vNIC is limited to the bandwidth of a single host uplink. For a VM to effectively utilize multiple host uplinks it would need to be multi-homed by configuring it with multiple vNIC’s. This is a very small downside when the benefits are so great for the majority of workloads and situations and the sheer simplicity of the configuration and use.
What about LACP?
If you have the Nexus 1000v vDS in your environment (or vSphere 5.1 vDS) and you have switching infrastructure capable of supporting Virtual Port Channels then you may benefit from using LACP. LACP with the Nexus 1000v has 19 different hashing algorithms (vSphere 5.1 vDS has only one algorithm). LACP still suffers from the technical complexity as Etherchannel and some of the same limitations, such as not being able to span switches without special configuration and support. However if you are using Nexus 1000v you have chosen a somewhat more complex configuration in addition to the other features and benefits provided. The additional load balancing methods offer a much greater chance to gain even load balance from many more situations than with static Etherchannel, even though a single conversation is still limited to a single host uplink. If you have the Nexus 1000v and infrastructure capable of supporting LACP across multiple switches so the switch is not a single point of failure then this would be a superior choice than using mac pinning.
Use Load Based Teaming unless you have no other option, and even then you should seriously consider not using Etherchannel and IP Hash Load Balancing. The complexity, increased overheads and lack of probable real world benefits make IP Hash a poor choice for most use cases. Remember LACP is not currently supported on the native VMware vDS and I think even if VMware decided to support LACP in future the case for LBT in preference to LACP would still be strong. I would be interested to hear your thoughts on this topic.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.