I read a great blog post a while ago from Jason Boche titled Jumbo Frames Comparison Testing with IP Storage and vMotion. The results of the tests showed at best marginal gains to be had from using Jumbo Frames with 1Gb/s NIC’s on ESXi 4.1. Based on reading this, and a lot of discussion that came out of PEX 2012 regarding Jumbo Frames I decided to conduct my own tests to see if the results were any different when using modern 10G switches and NIC’s. Some of the results were not what I expected.
Previous testing I had conducted in customer environments with 10G switches and NIC’s had shown anywhere from 10 – 30% improvement in raw throughput as well as lower latency and improved CPU efficiency. A lot of the performance characteristics are OS dependent, and in the case of Linux will depend on how you’ve tuned your kernel. Both switching equipment and NIC’s have improved a lot over the last couple of years, so it’s possible the differences I found in performance between MTU 9000 and MTU 1500 reflect that as well.
For the testing in my lab I wanted to do as close to valid testing as possible without changing more than necessary, as I have quite a lot of stuff that is deployed and relying on Jumbo Frames (like my storage). My entire underlying network infrastructure in my lab, including my routing switches are all configured for Jumbo Frames. The vSwitches used for the VM’s and VMK ports are also configured for Jumbo Frames, and were not modified during the tests. So my testing was limited to changing the MTU settings for the VMK Ports for vMotion and the NIC MTU settings for the Guest OS’s I tested.
Also check out my other article titled Jumbo Frames on vSphere 5 U1.
Lab Test Hardware:
Host Type: 2 x Dell T710, 72GB RAM, Dual Intel Xeon X5650, Intel X520-T2 Dual Port 10G NIC
vSphere Version: vCenter 5.0 GA, ESXi 5 – Build 515841
Network Switch: Dell PowerConnect 8024 – 24 Port, 10G-BaseT
- Windows 2008 R2 Enterprise 64bit – 32GB RAM, 3 vCPU, VMXNET3
- SLES Linux 11 SP1 64bit – 16GB RAM, 3 vCPU, VMXNET3
- Windows 2003 Standard 32bit – 4GB, 2 vCPU, VMXNET3
Additional details regarding My Lab Environment.
Lab Test Script:
For the vMotion Tests I used the Windows 2008 R2 systems while they were running a Prime95 x64 Torture Test. This ensures that as many memory pages as possible are changing as fast as possible. This places a lot of stress on vMotion, which will extend the migration times and should fully utilize the 10G NIC’s. The Hosts start with a single VMKernel NIC port configured for vMotion, and configured for Jumbo Frames. A second VMKernel port is configured on a separate port group ready when needed for the testing. I conducted multiple tests and used the average from the best test as the results.
- Start RESXTOP from vMA against both hosts in batch mode and record the output from both Test vSphere Hosts.
- Power on two Test VM’s on Server 1, start torture test on both VM’s
- Migrate by vMotion both VM’s to destination host Test Host 2 at the same time
- Migrate by vMotion both VM’s back to source host Test Host 1
- Repease step 3 and 4 again
- Enable the second VMKernel vMotion port on each of the test hosts
- Repeat steps 3 – 5
- Modify VMKernel Port MTU to 1500 on both VMKernel ports on both test hosts
- Repeat steps 3 – 5
- Disable the second VMKernel vMotion port on each of the test hosts
- Repeat steps 3 – 5
- Reset hosts to original configuration
For the Guest OS Network Performance Tests I used iPerf, which is an open source network performance test tool. Due to Windows 2003 not supporting receive side scaling I used 10 parallel streams to get the performance results, with both SLES and Windows 2008 R2 I used a single stream.
Guest OS Network Performance Tests
- Power on First Test VM on Test Host 1
- Power on Second Test VM on Test Host 2
- Configure each VM to MTU 1500
- Start iPerf in Server Mode on Test VM on Test Host 2
- Start iPerf on Test VM on Host 1 to commence the test
- Record the results
- Configure each VM to MTU 9000
- Repeat steps 4 – 6
For each of the Guest OS’s being tested execute the steps above. Below are the iPerf commands I executed during my tests.
Receiver Node: iperf -s -i 60 -w 1m -f m
Sender Node: iperf -i 5 -w 1m -f m -c <receiver_node_ip>
Before starting this testing process I thought I was going to get a 15 – 20% difference between Jumbo and Non-Jumbo. I based this on previous experience and also that the offload capability of 10G NIC’s, Server CPU’s and 10G switches have all improved over the last couple of years. The difference was a bit less than I expected. But still a decent amount compared to what might be expected on a 1Gb/s network. I was not able to test the Jumbo Frames performance on Windows 2008 R2 due to a bug in ESXi 5 VMware Tools and VMXNET3 that prevents Jumbo Frames from functioning, see my previous post Windows VMXNET3 Performance Issues and Instability with vSphere 5.0.
The SLES 11 SP1 VM has had quite a lot of tuning from the out of the box configuration. The tuning probably resulted in the performance of that VM that is roughly the same as the vMotion throughput. If you have not tuned your Linux kernel I wouldn’t expect you’d get the same performance. The Windows 2003 and 2008 R2 were both out of the box configurations with only the VMXNET3 driver MTU modified on the 2003 system.
As you can see from the test results the Linux VM and VMKernel port used for vMotion can saturate a 10G link when using Jumbo Frames. The difference between Jumbo and Non Jumbo on Linux is probably higher than with vMotion due to vMotion VMKernel port being highly tuned for one purpose. The Non Jumbo performance of Windows 2008 R2 was quite close to the Linux Non-Jumbo performance, which shows the improvements that Microsoft has made to their IP stack since Windows 2003.
The Bottom Line:
Using Jumbo Frames requires that all devices from end to end in the network path between source and destination are configured correctly to support MTU 9000, i.e. switches, routers, vSwitches and Servers/VM’s. If Jumbo Frames is not enabled throughout the network path from source to destination you will get packet fragmentation, which will reduce performance back to that of Non-Jumbo. In an existing network if Jumbo Frames was not enabled when it was constructed it could involve considerable effort to change it. However you don’t necessarily have to change it everywhere or on everything, depending upon how your network is segmented. You might consider just enabling it on the segments of the network and servers/switches that could benefit the most from using Jumbo Frames.When implementing new 10G infrastructure it may be worthwhile configuring all new network infrastructure for Jumbo Frames, which is very simple during initial configuration. Even though modern NIC’s and switching equipment have reduced the difference between Jumbo and Non-Jumbo it can still be worthwhile in a number of cases.
My results suggest you could get anywhere from 10% to 13% for normal Guest OS traffic flows and between 8% and 19% for vMotion traffic flows. You will need to decide if the additional throughput, lower CPU usage on servers and network switches/routers and less latency is worth the effort. Two traffic flows that can benefit a lot from the implementation of Jumbo Frames are vMotion and also the Oracle RAC Private Interconnect Network. These types of traffic are normally isolated onto separate switches or non-routed VLANs and could be a prime candidate to implement Jumbo Frames in isolation from the rest of the network. In my vMotion 2 NIC tests the 19% improvement in throughput reduced the migration times by 10 seconds (from 50 seconds down to 40 seconds) for my 2 x 32GB RAM VM’s.
For Oracle RAC in particular Jumbo Frames is recommended even on 1Gb/s networks as a single DB block can then fit into a single IP packet, which reduces DB latencies across the private interconnect. With the latest version of Oracle RAC 11G R2 up to 4 private interconnect networks can be used to provide load balancing and high availability. For databases that make heavy use of the interconnect this can provide a big performance boost without having to completely re-architect the database.
A 10% performance degradation might not sound like much, but when you’re talking about a 10Gb/s network that’s like losing the performance of an entire 1Gb/s link. When you use multiple links it quickly adds up to be a substantial loss of performance. The benefit of Jumbo Frames is only going to grow with the new 40G and 100G Ethernet standards. Let’s just hope that the OS IP stacks are improved enough to cope with the new standards when they start to become mainstream.
I would encourage you to test it out and implement it where appropriate. Not every application or use case needs Jumbo Frames, but there are a couple of good ones that do.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.