Your database is just an extension of your storage. Many of you may have heard me say this before in presentations, such as at VMworld or a VMUG. Understanding this concept is the first step on the way to configuring your database servers for optimal performance. This is true regardless if the databases are deployed natively or virtualized, and regardless if they are still living on mainframes, Unix platforms, or on x86. This is one of the 5 key principles that I go into detail about in the book I’m co-authoring with Michael Corey and Jeff Szastak titled Virtualizing SQL Server on VMware: Doing It Right by VMware Press (available for pre-order, see my books page or the right side bar). In this article I will take you through some of the areas I have seen for over a decade cause the biggest problems for performance of database servers of all types, and on all sorts of platforms, at a high level. If you want more, and the detail, then you should seriously consider buying my book. Based on my experience these principles apply regardless of the platform, technology, or database management system you choose to use.
Because your database is an extension of storage most of the performance issues I see are related to storage. This is just the high level of what you need to know in order to tackle this area. Although storage is correlated with other resources as well, which you shall see.
Your database is a big cache sitting at the end of the storage chain that is designed to retrieve and update data as quickly as possible. This cache effect is important as it means read IO’s can be reduced to back end storage if we allocate enough RAM. This also means there is a direct trade off between memory allocation and read IO performance. The more RAM the less read IO needs to go to physical disk and the less burden on your storage resources and the faster the read responses. If only you could load your entire database in memory (such as with SAP HANA and Oracle Times Ten).
When IO’s do need to go to physical disk, both reads and writes, the database expects fast and parallel access to storage resources. The parallel part is the most important. Databases expect to be able to query and update many blocks and files all at the same time. As an architect of a database platform it is our job to reduce as much as possible any bottlenecks that will prevent a database from accessing all the storage resources it needs, and to the degree of parallelism that it needs. We need to architect each logical layer of the database for optimum parallel performance.
The first place we can get contention and bottlenecks is inside the database itself in it’s caches. This can be caused by many queries trying to touch the same database pages and update the same database pages all at the same time. It can mean that you haven’t split the data over enough data files and that there is insufficient numbers of data files to keep the system busy processing queries. So the first thing you need to do to reduce logical contention in the database itself is have enough data files.
The number of CPU cores can play a part in determining what is ‘enough’. Each CPU Core (physical or virtual) is a single thread, and can only do one thing at a time, even though it does it very fast. So if you want to keep your CPU’s busy processing transactions and doing useful work (which you do), you need enough data files for all your CPU’s to be doing something with them. The general rule of thumb to reduce logical contention and improve parallelism of access in a database is by having at least one data file per CPU core. Having a single data file and 64 CPU cores is probably not going to result in great performance, it will also lead to other bottlenecks, which I will now touch on.
The next step to increase parallelism of a database system is to have access to wide and deep storage queues, and enough physical storage devices to handle the likely peak loads in a reasonable response time. When it comes to storage, queues are everywhere and we need to understand them. They are not necessarily a bad thing, unless they are a bottleneck, as they allow us to issue parallel IO’s and get more responses concurrently. We want to make sure that we are not queueing IO’s up inside the operating system unnecessarily, which will delay responses, and put additional load on the database server. So we want to get as many IO’s as fast as possible from the database system, through the operating system, to the storage controller and to the physical storage.
The first place we will encounter the queues is in the disk device where our database files are stored. Each individual disk device has a particular queue depth limit. The actual limit depends on the type of disk and the type of controller that is being used. For example, an LSI Logic Controller has a queue depth limit of 32 outstanding IO’s (OIO) per disk device. Other controllers may have other limits, some can be tuned (see this article), and some can’t. This means, in this example at leasts, that you can issue a maximum of 32 IO’s in parallel to that device. In most enterprise databases a single disk device does not have a deep enough queue to handle enough IO’s in parallel to meet the performance requirements, and therefore multiple devices (disks or LUNs) are required.
When you deploy those multiple devices you then place your multiple data files over those multiple disk devices to increase the parallelism of access. If you have a DB that doesn’t allow easy striping of data base files and table spaces over multiple disk devices then you can use a logical volume manager to stripe the underlying disk devices and present them as a single volume to the operating system (this is before we touch a SAN or other type of underlying storage). But this does increase management complexity somewhat compared to a simple filesystem, depending on the volume manager chosen. You can’t indefinitely increase the number of data files and the number of disk devices without reaching our next bottleneck.
The next bottleneck is the storage controller itself. Each storage controller has a queue limit as well. In the case of LSI Logic SCSI Controllers it’s 128 total outstanding IO’s. So one controller can support up to 4 disk devices, each running at 32 queues, if all the queues are in use, before there is any contention or bottlenecks. This is even though a LSI controller can have 15 disks connected to it (more if it’s an HBA). If you have too few queues available on a disk device you start queuing IO’s in the operating system, and if you have too few queues in the storage controller you start queuing up IO’s in the operating system. Every moment an IO is queued in the operating system is a moment that the IO can’t be serviced from disk. Some storage controllers allow their queue depths to be modified and some don’t. HBA’s generally can have their queues modified, and this involved tweaking the settings of the particular driver. One easy way around the per controller limit though is to have multiple controllers.
So now we have multiple data files, multiple disk devices and multiple storage controllers. At this point from the database down to the operating system we have increased the parallelism of IO’s and the amount of queues that the database has access too. The chances are, if we’ve given the database enough resources at this level, that the actual storage devices will now be the bottleneck, as there is a smooth IO path through the OS. We are not adding any unnecessary IO latency in our operating system configuration. Below are two example images of how this might look. One for Oracle and one for SQL Server.
In both of the above example diagrams the database is deployed on a virtual machine. But both have a similar layout with multiple disk devices and multiple storage controllers. With the SQL Server example you could add further storage controllers to increase queue depth and parallel IO access.
All of the above has really been about designing for performance. When you design for performance you will usually end up with sufficient capacity. If you design for capacity you will find that you end up with lots of unusable capacity at the end of your performance. The physical storage devices matter, but this article is not going to go into physical storage design. That is between you and your storage vendors. This is all about the OS and Database layer and the bottlenecks that commonly occur there. However I will cover one more topic briefly and that is IO size.
Not all IO’s are created equal. The IO pattern and the size of the IO’s matters a lot. You will commonly see storage vendors quoting IO performance in 4KB IO size and in different IO patterns. Usually however applications use much larger IO sizes (8KB, 16KB, 64KB and higher), databases in particular can have lots of random IO, and can be either mixed read and writes, or heavily write biased. The size of the IO’s can have an impact on the controllers efficiency at processing those IO’s. Lots of small IO’s may have a higher overhead than a smaller number of larger IO sizes, although they may have similar throughput. The randomness has a big impact on performance when relying on rotational disk media (spinning hard disks) especially, as the heads have to move to the different block locations and this causes delays that wouldn’t exist for more sequential workloads. In the real world small IO sizes and lots of IOPS generally has a high overhead and low overall throughput and you see lots of random IO.
So what does this mean? Well it means that an IOPS figure by itself is completely worthless to you. It gives you absolutely no information about the performance, requirements or constraints. It also does not allow you to compare to any vendor benchmarks to get an estimate of the performance you should expect (at least not in an accurate way). If you are issuing 512KB IO sizes then 300 IOPS might be a lot. But if you are using 4KB IO sizes then 12K IOPS might be a lot. It all depends. Oracle databases issue a lot of 8KB and 16KB IO sizes for data files and 512KB or above for transaction log. SQL Server issues lots of 64KB IO sizes. SQL Server issuing 5K random mixed IOPS at 64KB might be the equivalent of 160K 4KB IOPS, depending on the type and layout of the disks. The best way to understand your workloads is to baseline them and get to know the IO patterns, many monitoring tools can make performance management easy even at large scale and give you good information about performance. You should treat any vendor performance data reported in 4KB IO sizes and without mixed random workload patterns with skepticism. You should ask your vendors for applications IO performance data instead, as that will give you a more accurate indication of capabilities.
Final Word
I see problems in database performance every week in my interactions with customers all over the world (which for me also usually means at all times of the day and night). Usually where there is bottlenecks in the operating system design and where the database is not able to issue enough IO’s in parallel primarily due to not splitting out database files, disk devices and storage controllers. These problems exist with physical databases just as much as they do with virtual databases. Probably more with virtual databases as some people think the rules of the operating system and virtual controllers have somehow changed what you should do for a database. They haven’t. Virtualizing a database does not change the best practices of the operating system much if at all. The same things you did to a physical database to improve IO performance, as I’ve covered here, still apply in a virtual world. The queue limits of SCSI disks and SCSI controllers still exist. So you need to properly design your systems to avoid these bottlenecks where necessary. Hopefully by gaining a high level understanding of these concepts I can help you avoid some performance traps.
I go into a lot more detail, with a lot more examples, diagrams, and ways to tune and tweak your designs to improve performance in my book. So if you are virtualizing SQL Server, or thinking about it, then Virtualizing SQL Server on VMware: Doing It Right might be a good resource for you. The book is available for pre-order now and will be published in August 2014. Michael Corey, Jeff Szastak and I will be doing author signings of the book at VMworld in the USA and possibly Europe and at vForum in Sydney, hope to see you there.
As always your feedback and comments are appreciated.
—
This post appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2014 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
“The bitterness of poor performance lasts long after the sweetness of a cheap price is forgotten”.
Excellent quote!
What should be ideal iops to manage an standard data center cluster with 30-35host with vxswitch configured.
Hi Shahbaaz, There is no right or wrong number of IOPS and there is also no typical or standard datacenter cluster in terms of IOPS. It all depends on the workloads and what their requirements are. As an example if you needed 4K IOPS at 16KB IO size for a 100% random workload that was 50% read and 50% write, on a per host basis, then you might need > 100K IOPS at this size for that large cluster. However in terms of 4K IOPS, you could be looking at 4x that number in terms of the standard vendor benchmarks.
Also for a VMware vSphere environment your cluster size will be limited to 32 hosts. With other hypervisors or platforms that may not be the case. The best place to start is by analyzing the IO requirements of all of the workloads you plan to host in the cluster. If they are existing workloads then there are lots of tools that can help. If they are new workloads then you need to work with the applications team to estimate their requirements and the IO sizes.
Based on testing I’ve done with the Nutanix platform that a 32 node cluster could produce over 128K IOPS at 16KB IO size with a 100% random 50% mixed workload. But the Nutanix storage cluster is not limited to 32 hosts like a VMware cluster would be. You could have a 64 node Nutanix storage cluster servicing 2 x 32 Node VMware vSphere clusters. Because the performance is linearly scalable as you add nodes you add performance. So in this example you’d get over 256K of these large random mixed IOPS. This assumes your workloads and performance scale out and don’t need all of that performance on a single system. If you needed 100K of these large random mixed IOPS to a single VM, then a traditional storage array would be a better solution and capable of providing it, provided it was designed correctly. The vast majority of workloads use < 1K IOPS with this type of IO profile and workload pattern.
How about DB2 Database
Do you mean in terms of a diagram? The same fundamental concepts apply. You need to increase parallel access to storage resources. I don't have an example diagram for DB2 though at the moment.
[…] Database Servers for Optimal Performance Some great lines in this article by Michael Webster. I love “your database is just an extension of your storage” and […]