A little while ago I wrote an article titled 5 Tips to Prevent 80% of Virtualization Problems. This article was all about storage and how to configure your storage and the dangers to watch out for. This is because problems in virtualized environments are predominantly caused by or related to storage in one way or another. In that article I explained the impact of queue depths on performance and also some of the dangers of making the HBA device queue depths too high. What I didn’t know at the time I wrote the previous article was that the default queue depth for QLogic HBA’s was changed between vSphere 4.1 and 5.x. This article will being you up to date on the changes and the impacts of the change in default values between vSphere 4.x and 5.x.
The HBA device queue depths are important as I outlined in 5 Tips to Prevent 80% of Virtualization Problems because it has a big impact on the number of parallel IO’s that a VM can issue and that can be serviced. It also has an impact on the number of LUNs that you need to support to achieve the same performance. If your queue depth is too low you can have very high latency when your VM’s are trying to issue a high number of parallel IO’s as they’ll all queue up inside the hypervisor. If your HBA device queue depth is too large you could have lots of IO’s either queueing up in the HBA itself, or you could overload your storage array. So you need to strike the right balance.
This article is a follow up to Cormac Hogans article titled Heads Up! Device Queue Depth on QLogic HBA’s that was published in response to some queries VMware had received from one of their Technical Account Managers. I would recommend you read Cormac’s article for the reasons why the change to the defaults was made, as I won’t cover that here.
The following are the default HBA device queue depths when using QLogic HBA’s for FibreChannel or FCoE SAN connectivity:
- ESXi 4.1 U2 – 32
- ESXi 5.0 GA – 64
- ESXi 5.0 U1 – 64
- ESXi 5.1 GA – 64
Note: This change does not effect Emulex HBA’s, only QLogic.
The VMware KB 1267 – Changing the queue depth for QLogic and Emulex HBAs, which documents the process for changing device queue depths for QLogic and Emulex HBA’s has been updated to include the default queue depths for the adapters per vSphere version.
So is this change to the default really significant? I think it’s significant in that it wasn’t documented anywhere and in fact the QLogic HBA documentation still lists the default as 32. It’s also significant due to the impact of an overload condition can be quite a dramatic negative storage performance hit, which could take a while to troubleshoot. But for a very long time it had been a common best practice for VMware to recommend changing the HBA device queue depth on QLogic HBA’s to 64 from the default of 32. In most cases this had a positive impact on performance with reduced IO latencies. If you are using Storage I/O Control it will dynamically adjust queue slots between different VM’s on a shared datastore and you don’t need to worry about the device queue depths.
Storage I/O Control takes away the worry and will adjust performance to ensure the latency thresholds are met (by default 30ms). If you have vSphere Enterprise Plus (4.1 and above) and you have Multiple VM’s per Datastore you should be making use of Storage I/O Control. The device queue depth is used when there is only one VM per datastore and Disk.SchedNumReqOutstanding is used when there are multiple VM’s per datastore, in which case the per device queue depth is ignored. As Paudie O’Riordan, one of VMware’s Senior Staff Technical Support Engineers says “let the computer (SIOC) make the decision, not the finger and the wind”.
However there are a few cases where the queue depth of 64 had a detrimental impact and that was largely when non-virtual systems were sharing the same storage array as the vSphere hosts. In this case the vSphere hosts got a far larger proportion of the array’s IO resources and this could impact the performance of the non-virtual systems. I would recommend that where possible you don’t share storage arrays between your virtual and non-virtual environments, which would avoid these types of impacts. In cases where that is not possible you will need to carefully consider the quality of service and storage IO isolation requirements and impacts that high performance vSphere hosts could have on the overall storage array.
The Queue Depth for all devices on the QLogic HBA is a total of 4096. So if you have a per device (per LUN) queue depth of 32 you can support 128 LUN’s at full queue depth, without queueing in the HBA. If you increase the queue depth to 64 (as is the new default in 5.x) then you can support only 64 LUN’s at full queue depth. You can still have more LUN’s configured based on the assumption that not all LUNs will be using all the queue depth all at the same time, so you can effectively overcommit queues in essence. But it would pay to consider the impact of a large queue depth if all VM’s do start issuing IO’s. As Cormac says in his article “If you hit the adapter queue limit, then you won’t be able to reach the device queue depth, and may possibly have I/Os retried due to queue full conditions.”
Now this is only for the HBA queue depths. What about the target ports or storage processor ports on the array? A lot of array storage processor ports will have a queue depth of 2048. You should check with your storage vendor what the Target Port Queue Depth is for your array, if any. As you can see if the HBA is only configured to issue IO’s to one target port a single HBA could easily overwhelm the storage processor and this could cause a QFULL. Fortunately your design should have LUN’s configured across multiple target storage processor ports and multiple storage processors to reduce the risk of overloading. So what happens in a QFULL scenario? Well you can read the QLogic Document titled Execution Trottle and Queue Depth with VMware and QLogic HBA’s. In essence the vSphere Host will set the queue depth to the minimum, which is 1. You can just imagine what this would do to your performance.
I recommend you read Cormac’s article Heads Up! Device Queue Depth on QLogic HBA’s and read the QLogic document Execution Trottle and Queue Depth with VMware and QLogic HBA’s. Overall the change in default queue depth for QLogic HBA’s should be positive for performance in most environments. In some environments however you may need to adjust the settings to reduce the risks that I have outlined here. It’s far better to be armed with this knowledge than suddenly have your storage performance fall off a cliff and not know what might have caused it. If you have vSphere Enterprise Plus (4.1 or above) and you have Multiple VM’s per Datastore you should be making use of Storage I/O Control.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
Reading your post was a walk down memory lane. I remember implementing many of the details you've described in your post while in the DRS team. Storage I/O Control is an amazing feature and I really think that people who run highly consolidated storage systems should upgrade to get access to it. Just set it and forget it.
Apologies for being a bit pedantic here…however, the statement on QD being ignored when there are multiple VM's per datastore, had me revisit Duncans's aritcle: http://www.yellow-bricks.com/2011/06/23/disk-sche…
If I understand correctly, QD value is still relevant even with multiple VMs per Datastore. vmkernel will start throttling the queues to the DSNRO value only during high i/o times and when a certain condition is met (i.e. Disk.SchedQControlVMSwitches). And when the situation is normalized (i.e Disk.SchedQControlSeqReqsm is met) it reverts to the original QD vaule.
now I maybe slightly/completely wrong here, but that's the point of posting this comment, to get good understanding of the concept 🙂
Hi Obaid, It's not being pedantic, pursuit of understanding is a great endeavour. QD has no part to play when SIOC is enabled and according to QLogic documentation when multiple VM's are on the same datastore, which is where DSNRO kicks in. In which case it acts more as an initial indication. That's why I linked through to the QLogic documentation on execution throttling. Duncan's article is correct as far as I understand and indicates the same as I have written, the source of his article and most of mine is Thor, who is a legend when it comes to vSphere Storage. I.e. that the QD is actually ignored and queues are limited to DSNRO, not QD. To quote Duncan's Article "When two or more virtual machines issue I/O to the same datastore DSNRO kicks in." There is nothing that says DSNRO will be the same as the queue depth, especially in environments where multiple devices have different requirements. For example Software iSCSI has a default QD of 128, Flash storage that might have a QD of 255. DSNRO might well be set to 128, even though FC LUN's are also attached to the hosts. DSNRO could be set to the default of 32 even though the device queue depth is higher. The moral of the story is that SIOC should be used in almost all cases as a best practice and there are very few if any cases where it should not be used. Using SIOC just takes all the guess work out of the equation, or like Paudie put it, eliminates the finger in the wind.
Thanks Michael, appreciate you taking time to reply. I completely agree, when enabled SIOC will eliminate the need for manual calculations and monitoring. Although IMHO, the qlogic doc does not seem to be entirely correctly on implying that QD is ignored when we've multiple VMs issuing I/O to a single LUN. As Duncan's article clearly states, "When two or more virtual machines issue I/O to the same datastore DSNRO kicks in. However it will only throttle the queue depth when the VMkernel has detected that the threshold of a certain counter is reached".
So In a scenario with default values and without SIOC enabled (because of licensing for example), I think vmkernel can throttle the no. of I/Os up-and-down between 64 and 32 depending on the load and the counters on the parameters(shared earlier).
I have found the following vSphere blog quite informative as well on the topic: http://blogs.vmware.com/vsphere/2012/08/advanced-…
Hi Obaid, VMKernel will throttle to DSNRO, not QD. QD only applies when one VM is on the datastore. So assuming default for DSNRO VMKernel would throttle to 32. There is no mention of DSNRO value changing in KB 1267. I'll see if I can find out if that has now also changed in line with the change in default QD for QLogic.