Over the last couple of weeks the debate about whether or not to enable Jumbo Frames has been coming up quite a bit. This has been largely driven by discussion around VSAN in vSphere 5.5 and other types of network based storage access. A couple of people have been questioning the wisdom of having “enable Jumbo Frames as a best practice” recommendation for VSAN or not, due to a perceived negligible benefit when compared to the perceived complexity involved in implementing it (I will drill into these concerns below). I was quick to point to various test results showing at least a 10% benefit in performance, including my previous articles Jumbo Frames on vSphere 5 and Jumbo Frames on vSphere 5 Update 1. However my previous testing was not for storage access, it was simply network performance. So I thought maybe there is a difference when you’re using NAS storage, or VSAN type storage over a 10G network, and maybe Jumbo Frames doesn’t make all that much difference in that scenario. So I thought I’d test some storage access scenarios over my 10G LAN in My Lab Environment to see if Jumbo Frames made any difference or not.
I’ve heard reports that some people have been testing VSAN and seen no noticeable performance improvement when using Jumbo Frames on the 10G networks between the hosts. Although I don’t have VSAN in my lab just yet my theory as to the reason for this is that the network is not the bottleneck with VSAN. Most of the storage access in a VSAN environment will be local, it’s only the replication traffic and traffic when data needs to be moved around that will go over the network between VSAN hosts. The latency introduced by the network in those cases would be negligible compared to the cost of accessing the local host storage. Then there is the argument that LRO/LSO on modern 10G NIC’s negates the benefit of Jumbo Frames. The thing is LSO only helps with outbound and LRO doesn’t deal entirely with the overheads associated with per packet processing of the inbound packets. At least Linux has LRO support whereas Windows doesn’t yet. VSAN storage traffic isn’t all that is going across the network.
Does any of this really matter when it comes to setting best practices? Not entirely. But to explain that, first we need to look at what best practices actually are. Let’s take a look at what the VCDX Boot Camp book by John Arrasjid, Ben Lin and Mostafa Kahlil, and my quote on page 20 of that book says about best practices.
“Use of best practices may apply for a majority of implementations, but these are not customer specific or applicable in all situations. A qualified design expert knows when to deviate from best practice while providing a justifiable and supportable solution” I go on to elaborate on that point by saying “Best practices are a baseline from which we work in the absence of specific requirements that would justify deviation. Knowing why it is a best practice is important so that you know where to create a new best practice specific to your design and customer.”
So what we can immediately take form the above is that best practices while beneficial to them majority of implementations may not be applicable in all situations. This is the case with Jumbo Frames as there will be a lot of dependencies. If the network or CPU is not the bottleneck then it’s unlikely Jumbo Frames by itself will be a silver bullet for your application or storage access performance issues. It will also depend on the switching infrastructure and NIC’s in use. But for the majority of situations Jumbo Frames could be beneficial, especially when using 10G or higher bandwidth network infrastructure. As you’ll see shortly from my test results Jumbo Frames improves performance significantly in the situations I tested in My Lab Environment.
Next we need to look at what the trade offs are. If I enable Jumbo Frames for VSAN or other types of traffic are there any downsides? Well firstly it’s got to be enabled from end to end in the network communication path in order for it to be effective. So that means the VM’s, VMKernel ports transmitting VSAN or vMotion traffic, the virtual switches, and the physical switches or routers need to be configured to accept Jumbo Frames. Why 10G plus equipment doesn’t come out of the factory configured to accept Jumbo Frames I don’t know. In any case the necessary configuration is a trivial exercise when setting up new infrastructure, but to retrofit existing network switches, routers and the virtual environment on a large scale if it wasn’t done originally can be a little harder and more complex. But in the context of 10G+ storage networks and vMotion networks, which are meant to be flat and closely connected, I would argue it isn’t that much trouble. But we must accept there is more to configure. This is one tradeoff.
But what if you enable it for only part of the network path or you make a mistake with the Jumbo Frames configuration? This is a very common concern or objection to enabling Jumbo Frames. In this case you’re going to get packets fragmented to the standard frame size, as if Jumbo Frames wasn’t enabled at all. But thanks to Path MTU Discovery that’s pretty much it. So you’re no worse off by setting Jumbo Frames than you would be if it were not set, so no harm done. Jumbo Frames will just not be used. Because really all configuring Jumbo Frames does is enable an upper limit for the Maximum Transmission Unit larger than the standard frame size (normally 1500B). It has no impact on the minimum size of a frame that is sent. If you do make a mistake troubleshooting where it’s gone wrong isn’t that difficult either with vSphere 5.1 and above as you can use the network health check or ping’s with varying TTL’s with the no fragment option (8972B packet size) to find out which part of the network path is incorrectly configured.
So from the above I would argue it’s reasonable to keep Jumbo Frames as a best practice recommendation when using Ethernet based storage access on 10G plus networks, even with VSAN and other technologies that behave similarly. It won’t cause any harm, and in many situations it will be of benefit. As bandwidth scales Jumbo Frames may provide even more benefit, depending on how NIC’s and switches develop. But just how much benefit? Is it really worth it? Now it’s time to review my test setup and my test results.
For my testing I used two of my ESXi 5.0 Update 2 Hosts with the following config: Dell T710’s with 2 x X5650 CPU’s (6 cores per socket, 2.66Ghz), 72GB RAM, Intel 520-T2 10G NIC. There were no advanced settings (such as interrupt coalescing) changed on the hosts with regards to the 10G NIC’s. The Hosts NIC’s are at the default settings. Both hosts were connected to a Dell 8024 10G switch. The 8024 switch is configured for Jumbo Frames (MTU 9216), the vDS that my test VM’s are connected to is also configured to allow Jumbo Frames (9000). So during the tests I was changing the endpoints to either accept or not accept Jumbo Frames. For the IO Load Generator and Storage Server I used 2 x VM’s, one on each host, configured with 6 vCPU’s and 8 GB RAM. The VM’s were configured with VMXNET3 vNIC’s and Interrupt Moderation was disabled on the vNIC driver within the OS. The VM’s used PVSCSI vSCSI adapters with the default settings.
Each VM was on a different host during the testing. To drive the storage load I used IOMeter. The storage server had a single thin VMDK backed by either a Micron or Fusion-io Flash PCIe card and this VMDK. The IO workload pattern was 100% sequential read with 32 outstanding IO’s. I only varied the IO size and whether or not Jumbo Frames were used during each test run, as the test were primarily to see the impact of the network configuration and not of different storage IO patterns. I took multiple test runs and the results are the lowest of the runs for each IO size. All measurements are taken from the IOMeter Logs on the IO Load Generator VM.
Here are my results, your milage may vary:
You can see from the results as the bandwidth of the 10G NIC’s reaches saturation point (64K IO Size) there is almost no difference between Jumbo and Non-Jumbo in terms of throughput and latency. However up until that point there is between a 9% and 23% improvement in performance due to Jumbo Frames on IOPS (and throughput), and between 9% to 32% improvement in latency. I also found that the CPU cost using Jumbo Frames was lower than with Non-Jumbo. To achieve the same throughput and latency in the 64K IO Size test the client used 40% more CPU with Non-Jumbo than with Jumbo (13.9% CPU Utilization Non-Jumbo vs 9.9% CPU utilization with Jumbo). However the offload capabilities of the 10G NIC did equalise the CPU cost for the Non-Jumbo Tests. Even during very high packet and throughput rate of the Non-Jumbo tests the CPU utilization did not exceed 16% on the IO Load Generator VM.
One of my readers was also kind enough to supply some test results comparing Jumbo Frames to Non-Jumbo Frames for NFS based storage on a Cisco 2020 10G infrastructure. Here is the graph showing a consistent 11% performance improvement in terms of IOPS for their testing. 8204 is the Non-Jumbo Tests and 8205 is the Jumbo tests.
In addition to enhanced IOPS the test results showed that in the case of using Jumbo Frames IO’s were serviced 85% within 500us, vs 65% within 500us for Non-Jumbo. The CPU utilization on the NFS filer was 60% during the Jumbo test vs 80% during the Non-Jumbo Test. These tests demonstrate higher IO throughput, lower latency and lower CPU utilization as the majority of my tests did. This is further evidence to consider using Jumbo Frames and in support of it being a best practice for Ethernet based storage and high throughput 10G plus networks.
Final Word
Based on my test results and findings above I would recommend Jumbo Frames is enabled on 10G + Networks, especially when using Ethernet based storage access, with vMotion, and with other high throughput applications (Oracle RAC Interconnects). This includes when using technologies like VSAN. I see no harm in this being recommended as a best practice provided customers and partners understand what best practices are and also understand Jumbo Frames. If you’re designing a new infrastructure around NAS, VSAN or other Ethernet based storage access then there is very little overhead in including Jumbo Frames up front. The applicability in my view of Jumbo Frames only going to increase with the adoption of network overlay or network virtualization technology such as with VMware NSX, VXLAN etc. So even if you think you can get away without have Jumbo Frames now, it’s very likely to be in your future.
I would be interested in your feedback and also interested in any other test results using Jumbo Frames on 10G or higher bandwidth networks and with NAS and VSAN type storage environments.
—
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
Great Analysis Michael. It's true it can be a "should we or shouldn't we" situation and there is an inbuilt *fear* of jumbo frames, if that's the correct word. So it's very useful to debate the pro's and con's and arrive at a definitive conclusion, which I think at least moves it on from a typical customer scenario. I think your points are fair and reasonable and well argued.
In my experience, even working with 10Gb/s, it doesn't always get configured Day 1 due to fear of adverse affects on the network, and ironically may get turned on when performance is a problem. It can be also due to the lack of black and white guidance where a vendor might "suggest" turning it on.
I have seen (even) physical Windows servers performing non-optimally due to native TCP window-size. A particular case in point was with a multi-threaded application replicating data over a WAN, performing dedupe hash-lookup, and then sending the deltas across a WAN. Windows performed optimally after an increase in the number of parallel TCP connections despite the fact it was maybe only 80Mb/s on a 10Gb/s Etherchannel. I know that's an aside but I suppose it supports the view that there are worse things that can happen, and other areas that warrant further tuning, and maybe it's better to just turn it on and take every % available.
Paul Meehan
@PaulPMeehan
Does fragmentation or PMTUD work when both end points are on the same subnet?
Hi Ryan, Yes, it works regardless if the network path is the same subnet or a remote subnet. My testing was based on both systems being on the same subnet.
I'm fairly certain that both IP fragmentation and PMTUD require a layer-3 hop in the path.
I may have been slightly mistaken in my previous comment. The discovery of the MTU difference works on the local subnet between peers but it technically might not use PMTUD, which as you point out is used when crossing routed networks at L3. I haven't been able to find the mechanism that is most commonly used between subnet peers, although I've read a couple of RFC's that have solutions. Such as RFC 4821 – Packetization Layer Path MTU Discovery and a proposal for Jumbo ARP. If you can find the actual mechanism that is used I'd be interested to hear more about it. In any case peers on the local subnet (assuming modern network stack) can figure out their MTU's and send packets at the appropriate size.
At VMworld 2013 I heard both sides of the argument, with different recommendations, in various sessions. One session said that the increased risk by human error in enabling jumbo frames outweighs the slight performance boost you get. Yes initial config of JF is easy, but down the line when switches get replaced or other changes are made, you can goof up the config and breaking NFS/iSCSI storage would be a very bad day.
In another session they said that jumbo frames are easy to configure, most hardware supports it today, and why not do it? As I recall some first generation HP Flex10 NICs Flex10 modules would not support the full 9K jumbo frame.
As you point out, there's not right or wrong answer. Personally I use jumbo frames on our UCS blades and non-jumbo frames for our legacy HP blade servers.
Hi Derek, I think the advice and scaremongering around Jumbo Frames is just plane wrong. It won't break or mess up NFS or iSCSI storage. It'll just put you back into the same position you would have been if Jumbo Frames were not enabled (higher latency, higher cpu usage, lower throughput). Any settings changes from defaults should be documented and known by the teams that are operating the environments so that can be taken care of during hardware refresh. This type of thing is even more reason why all network equipment should come from factory with Jumbo Frames enabled by default. The protocols involved already deal with variations in MTU size along the network paths. If this didn't work reliably the Internet as we know it would be completely broken as there are many different MTU's in use all over the place. Yes there is a risk of human error, there always is when humans are involved in things, but the impact is negligible and no worse than not having configured Jumbo Frames in the first place. Based on my test data and real world experience in NFS / iSCSI environments Jumbo Frames improves performance with little impact or trade off and is worth configuring in the cases I have outlined. But your example is a great one where some devices may not be able to use the full Jumbo Frames, which is fine, that is why Path MTU Discovery was invented. Those devices can continue to send at a lower MTU frame size and everything else can still benefit from the larger size.
Having worked a support desk for an iSCSI/NFS storage product, I would recommend not having Jumbo Frames as a standard. While Path MTU discover allows devices to still communicate it introduces latency, and with storage packets that is bad. Jumbo frames are a subnet wide configuration, and as such it can be very difficult for less skilled admins to keep configurations consistent. A common configuration error and subsequent storage performance troubleshooting nightmare comes when equipment is replaced. Consider a VSAN device on several physical ESX hosts. If there is a hardware issue on a host and it is replaced by another physical device, the jumbo frames config could be missed. Now you have VSAN nodes talking to each other at different MTU rates. Latency goes up, performance goes down, and frustration on the administration team goes through the roof. To me the issues that I have seen with the configuration are not worth the ~10% performance boost. Also in several tests that I have run the 10% increase from Jumbos is not a blanket statement across IO workloads. Some workloads are the same or slightly decreased performance with Jumbo Frames. Throw Murphys Law and PEBCAK errors in there and in mind it falls into advanced config and not standard config.
I would still argue given that Path MTU will occur anyway and having Jumbo and a configuration error is no worse than not having Jumbo. Further there are protections that can be used easily to avoid configuration errors, Host Profiles being one. Also 10% on multiple 10G or 40G connections is a lot. Which is why I kept this discussion to 10G plus networks. On 1G its just not worth it IMHO. Jumbo is going to be a reality for every environment in the future with SDN taking hold so might as well start getting used to it. Specifically for the VSAN use case there is network health check built into vSphere to find and alert on config errors. Makes finding the problems easy.
There's something not quite right here about Path MTU always working. I've a flat (ie L2 not L3) multi-site bridged network with Jumbo Frames enabled on all paths, when I was switching on Jumbo frames I experienced complete iSCSI traffic stall between devices in different sites (iSCSI replication from Compellent to Compellent). Only once the switch ports for the site-site links had jumbo frames enabled did the replication traffic flow again.
It's been a while since I did any serious cisco work but I think the problem is occuring here precisely because there's no L3 device between my two Storage controllers: from [http://en.wikipedia.org/wiki/Path_MTU_Discovery]
For IPv4 packets, Path MTU Discovery works by setting the Don't Fragment (DF) option bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an Internet Control Message Protocol (ICMP) Fragmentation Needed (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its Path MTU appropriately. The process is repeated until the MTU is small enough to traverse the entire path without fragmentation.
In this case there are no "devices" along the path so the sender never gets back any ICMP response so it keeps sending jumbo sized packets and just never gets a response.
The example you provided of devices on the same subnet that do work this out as far as I can figure out actually does just use the same ICMP mechanism.
I.E. device 1 sends a jumbo frame to device 2, all the L2 switch ports along the way on the local subnet support jumbo frames so the frame gets to device 2 without problems, device 2 isn't configured for jumbo frames so it sends an ICMP message back to device 1 which steps back it's frame size when communicating to device 2 – the key here is that the switches all support jumbo frames and that they have passed the frame successfully, if they don't that's when problems occur.
Hi Charles, I think what you've experienced is similar to what I experienced during when changing back and forth between Jumbo and Non-Jumbo. It takes a period of time for the endpoints to adjust to the difference in MTU. When I was doing this in my network it took a matter of seconds. But in a larger environment it could take longer. You're right that it's not using Path MTU Discovery as there are no L3 devices in the path. This assumes that the endpoints are updated at different times and not both set to the same MTU. Part of the testing I did was with one endpoint with Jumbo and one endpoint configured for Non-Jumbo during my tests. In those cases they were on the same L2. So when changing endpoints you should allow for this period of adjustment.
The first time I tried to do this I got it wrong and didn't update the site to site switch ports at all, 4 hours later I still had no replication traffic between sites. I'm pretty sure this isn't just a timing issue, if there's a pair of switches between any two devices that don't have jumbo frames enabled then those switches just transparently drop the traffic without informing the originating device at all.
I agree that there's too much fear over Jumbo frames and that we should just expect it to work out of the box but I'm still convinced there are edge cases where it doesn't work as expected and that these will continue to be a problem until we have a better solution to automated discovery and provisioning of network devices (Roll on SDN…)
Hi Charles, Yes that is the behaviour of some makes and models of switches from what I've seen. They will simply drop the packets that exceed the configured MTU on egress ports and in this case PMTUD won't be effective. Although this isn't normally done for switched packets.
In order for this to occur I would have expected that the endpoints were trying to send Jumbo before the L2 switches were updated. Whenever configuring Jumbo I recommend the physical switches are updated first before any endpoints. So while I think Jumbo Frames are beneficial and there is too much FUD around them, the pros and cons and the implementation plan needs to be considered carefully, especially when retrofitting into an existing environment. When setting up a new environment it's much simpler. But in spite of that, the performance benefits, based on tests that I and others have conducted on 10G plus networks, outweigh the downsides assuming a careful and considered approach is taken to the implementation. Given the possibly combinations and possible edge cases though your milage may vary.
When you did get it working what were the benefits if any that you saw for your replication traffic?
One more comment: On one of these L2 switched paths we had a Gigabit ethernet port provided to us by a WAN service provider.
On this path we couldn't get jumbo frames enabled at all until the service provider had enabled them on their network, of course then we couldn't use the maximum size for jumbo frames on our network because once the service provider did their QinQ stuff our max size frames exceeded their max size frames.
We had to step our max MTU down to 8192 (the next largest commonly accepted size across all of our equipment).
At the end of the day we got a minor (< 10%) bump in throughput on our iSCSI traffic locally between ESXi and Storage, a bit more of a bump in NFS traffic between Solaris hosts and no apparent change at all for the Compellent replication so it was a good bit of work for not a whole lot of benefit but it does mean that we have implemented "best practice" for both Compellent and EqualLogic so that has helped when making service calls….
Hi Charles, That's great feedback. Thanks for sharing. All the testing I've ever done on 1GbE NIC's has shown only very minimal benefit in performance, slightly lower CPU usage. However the benefit on 10G is much more evident (again based on my testing and experience). Also replication traffic would be impacted by the bandwidth delay product, which could also impact performance.
We also know we have some issues with some of our storage that we need more IOPs which we are working on, I'm pretty sure that once those are resolved we'll see more throughput, of course I won't be able to qualify that increase unless I switch back to non jumbo frames temporarily (which whilst an interesting exercise might not endear me to the rest of the firm!)
With Nutanix + Arista (where Jumbo is enabled by default on Arista switches) this is a no brainer! Great work Mike!
Good findings Michael. Especially when combined with feedback from the community.To mitigate one of the issues mentioned above about equipment failure and replacement equipment being misconfigured, I think having a simple checklist could address some of the misconfigurations from happening in the first place.
Very interesting discussion. I will do some further research as a result..
I'm wondering how Switching technologies impact Jumbo frames i.e. Cut-through/Store-and-forward etc.. Is this a factor in which brand of switches drop Jumbo frames when not configured for them? Also, is it safe to assume that Routers won't drop the Jumbo frames, but rather just perform fragmentation and increment the Jumbo frame counter when viewing the Interface output.. or will they also increment errors? I might do some tests as all this has me curious now..
p.s. I work for an ISP and we just keep standard 1500 mtu everywhere.
Just enabled Jumbo Frames of size 9000 on a aggregated (balance-rr) direct connection over two 1 GbE NICS between two servers.
Here is the archived stats doing some simple tests copying a directory of 12 GB between them using netcat.
1500 MTU:
177 MB/s (1416 MBit)
9000 MTU:
223 MB/s (1784 Mbit)
The teoretica max is 250 MB/s so switching to Jumbo Frames inproved alot compared to standard MTU.
Lots of limited factors in the test thou. disk I/O, using tar on remote ends which is single thread CPU limited, single TCP-stream, etc.
[…] http://longwhiteclouds.com/2013/09/10/the-great-jumbo-frames-debate/ […]
[…] http://longwhiteclouds.com/2013/09/10/the-great-jumbo-frames-debate/ […]