Queues are everywhere. At the Airports, at restaurants, and on the roads (like the image above). But in virtualized applications the queues we most often run into that cause us problems are in the storage systems and IO path of the applications. There are queues in the applications, the operating system, the storage controller drivers and storage devices in the operating system, in the hypervisors, in the back end storage devices. When you overload your storage queues bad things happen. As I explained in “5 Tips to Help Prevent 80% of Virtualization Problems” we need to understand the queues, how they are used by applications, and how to optimize the operating systems and virtualization platforms to get the most out of them. I see the most acute problems when customers attempt to virtualize large and high performance databases. The best cure in these cases is prevention.
The reason we often see issues with database performance in particular is because they issue a lot of IO’s all at the same time and can easily overload the queues of a single virtual disk or multiple virtual disks in some cases. A database can quite easily issue hundreds to a few thousand IO’s at the same time (using async IO), which if your database is configured to only use a single virtual disk in the operating system will cause excess queuing and slow response times. In Windows the OS disk device queues in storport are limited to only 255 outstanding IO’s, but many storage devices or virtual disks are limited to 32 or 64 outstanding IO operations per disk. If you overload your disk driver queue (such as LSI SAS or PVSCSI), then the operating system will queue the IO’s until responses for the previous issues IO’s are received. This means longer response times for your applications.
Because the first queue overloaded is usually the disk queue we recommend splitting high performance workloads that issue many IO’s in parallel across multiple disk devices. This is fine to a point. But then you may run into the queue depth limits of the virtual disk controller, such as 128 for LSI SAS, and default of 256 for PVSCSI in VMware. To work around this problem in VMware and Hyper-V you can add more virtual disk controllers, up to 4 SCSI controllers in VMware ESXi. In this case we recommend splitting the multiple virtual disks across multiple virtual SCSI controllers, such as PVSCSI. The database files are then split across the different virtual disks, the how of which is determined based on the type of database you use. For SQL Server I covered this extensively in storage chapter of Virtualizing SQL Server with VMware: Doing IT Right (VMware Press 2014).
Here is a test I did a while back that showed the different performance characteristics of virtual disk devices in VMware vSphere.
You can see from the above results that using PVSCSI is the best storage adapter to use to deliver the highest performance in terms of IOPS and throughput, the lowest latency, and it is also the lowest CPU overhead. In Hyper-V we would recommend generation 2 VM’s and SCSI disks.
PVSCSI in VMware vSphere allows you to change the default queue depths for a device from 64 to 256, and from the default per controller of 256 to 1024. Then you can have 4 controllers, allowing up to 4096 outstanding IO’s concurrently per VM. This is covered in VMware KB 2053145 Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than Paravirtual SCSI default values. Where it gives advice on increasing the default queue depths for VMware’s PVSCSI. Increasing the queue depth can improve performance by allowing more IO’s to be sent to the back end storage and improve response times. Provided your back end storage can respond to more IO in parallel (check with your storage team or storage vendor on the back end config). Moving a bottleneck from one place to another might not result in an improved experience if the other components are already overloaded.
One note of caution. Over the last 6 months I have been across a number of customers incidents running SQL Server databases on various different vendor platforms where the queue depths on PVSCSI were at their defaults and the OS was becoming overloaded. In some rare cases the OS queues were getting overloaded to the point that data appeared to not be able to persist to storage and resulted in some database file corruptions being reported in the windows event logs and SQL logs. Similar to those reported in Win 2K8 with PVSCSI Critical Issue. Although I don’t have conclusive evidence that the PVSCSI driver was the cause, in all cases increasing the queue depth of the PVSCSI controller in the registry in Windows (2008 or 2012) in all cases resolved the issues. The problem couldn’t be reproduced reliably in all environments, but in environments where customers saw the issue they could reproduce it on demand. My recommendations in this type of case is to log a Sev 1 case with VMware support and your hardware vendor or partner to help diagnose and root cause the fault. But be aware VMware doesn’t officially do root cause analysis unless you have Business Critical or Mission Critical Support. If I come across any more cases of this and get a conclusive root cause or VMware or Microsoft publish a KB on this I will update this article. We’ve also noticed that disks inside of Windows have write cache enabled by default, this is another potential cause of problems and it is recommended that you disable disk cache.
Avoid overloading your storage queues if you want to have good end user response times to applications. In VMware vSphere environments use PVSCSI virtual controllers and configure the maximum of 4 of them and divide your virtual disks across them. Make sure you are using the latest version of VMware Tools and the latest driver versions. Make sure you increase the default queue depths in the Windows Registry / Linux drivers. In Hyper-V use Generation 2 VM’s with SCSI Controllers. Other hypervisors may not require additional controllers depending on how they are designed. But the operating systems and databases don’t know they are virtualized, so they will work the same way and benefit from more parallel queues and more disk devices. LVM’s and striped volumes in the Guest OS can allow you to split a workload over multiple disks without increasing management complexity of the database or applications.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – 2016 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
The very strange thing about all of this is how VMware still, even with the latest version of its vCenter Server Appliance, uses multiple disks all on the same single LSI Logic Parallel virtual disk controller.
Thank you for writing this blog post. Would you say that your opinion is the PVSCI should now be a default in VMware environments, or should this still be used for performance and on a case-by-case basis? I think as of one of the 5.5 updates, PVSCSI is supported for MSCS. Any other caveats for making this a default?
I think it should be the default now that it can be used for MSCS or Windows Failover Cluster. If you pre tune your OS templates it makes it much easier to standardise great performance.
That makes sense. I’ve been looking around for information on this topic and this post was very helpful. At least having PVSCSI installed by default gives you higher queue depths and the ability to tune those higher if needed.
Thanks for this great post, like always.
Wich Tool can we use to mesure an actual DB VM, IO, and also the amont of queue in controller ?
You can use this freeware https://labs.vmware.com/flings/io-analyzer it is versatile and easy to use.
let me know once you get it tested or have found something else.
If all the disks are on the same datastore are you getting any benefits from multiple controllers?
Yes as the controller can still cause os bottlenecks.
I’m currently battling an issue very similar to this and had a few questions.
1)What type of log entries have you seen from Windows/SQL from the corruptions?
2) When the issue can be recreated, what type of traffic typical causes? Just high IO?
Oddly enough, we’ve only had problems when running VM Hw Ver 13.
Just to give a little more info, the specific SQL server having issues is w2012r2 w/ SQL 2016. We have PVSCSI adapters but we haven’t defined the SCSI queue in Windows.