Some time ago I wrote about the IO Blazing Datastore Performance with Fusion-io that I was able to achieve with a single VM, connected to a single VMFS volume, and using thin provisioned VMDK’s. Since then a new version of ESXi 5.0 has been released (U2) and some new drivers and firmware have come out for Fusion-io. I was provided an additional Fusion-io ioDrive1 card to go along side the ioDrive2 (both MLC based), and Micron sent me one of their PCIe cards to test (SLC based). So I thought I’d reset the baseline benchmarks with a single VM utilizing all of the hardware I’ve got and see what performance I could get. I was suitably impressed with Max IOPS > 200K and Max throughput >3GB/s in different tests (see graphs below). This baseline will feed into further testing of ioTurbine for Oracle and MS SQL server, which I will write about once it’s done.
Firstly thanks to Bruce Clarke, Simon Williams and Sergey Omelaenko from Fusion-io for arranging the Fusion-io cards (ioDrive1 640GB and ioDrive2 1.2TB) and to Jared Hulbert from Micron for arranging the Micron 320h card (SLC version) and Dave Edwards at Micron for the experimental drivers and setup advice.
Disclosure: Although Fusion-io provided the two ioDrive cards and Micron provided a 320h SLC card on loan for me to test they are not paying for this article and there is absolutely no commercial relationship between us at this time. The opinions in this article are solely mine and based on the test results from the tests I conducted. Due to the Fusion-io ioDrive cards being MLC based and the Micron card being SLC based the testing could be considered an apples to oranges comparison. SLC performance in general is far higher than MLC at a considerable price delta also.
Test Environment Configuration
As with the previous testing I used two of the Dell T710 Westmere X5650 (2 x 6 Core @ 2.66GHz) based systems from My Lab Environment, one with 1 x ioDrive2 1.2TB and 1 x ioDrive1 640GB Fusion-io cards installed and the other with the Micron 320h, both systems have the cards installed in 8X Gen2 PCIe slots. I configured the cards as a standard VMFS5 datastore via vCenter with Storage IO Control disabled. I installed the IO Memory Fusion-io VSL driver for the Fusion-io host, and the latest experimental driver I was provided on the host with the Micron card. The hosts are running the latest patch version of ESXi 5.0 U2 build 914586, vCenter is version 5.0 U2. Drivers are also available for vSphere 5.1 for the Fusion-io and Micron cards to be used as a datastore and I will retest at a later date.
I used a single Windows 2008 R2 64bit VM and a single Suse Linux Enterprise Server 11 SP1 VM for the tests. Only one VM at a time was active on the datastore. Each VM was configured with 8GB RAM, 6 vCPU’s, and 4 SCSI Controllers, one for the OS vmdk and the others for the IO Test VMDK’s as depicted in the diagram below. The Windows 2008 R2 VM used LSI Logic SAS for it’s OS disk and PVSCSI for the IO Test disks, whereas the SLES VM used PVSCSI for all disks. IO Load was only placed on the IO Test disks during my testing. The testing of the 2 x Fusion-io ioDrive cards was done by splitting the VMDK’s evenly over both cards to get the combined performance (second image below).
Micron Test VM Configuration:
2 x Fusion-io Test VM Configuration:
Notice in the diagram above I’m using Thin Provisioning for all of the VMDK’s and all of this is going through a single VMFS5 datastore for the Micron or two VMFS datastores in the case of the Fusion-io tests. So if anyone tries to tell you there is a performance issue with Thin Provisioning or VMFS datastores you can point them to this example.
As with the previous tests to generate the IO load I used the VMware fling from VMware Labs IOBlazer. It was very easy to deploy and use and offers a full range of the normal IO testing capabilities you expect, with the additional advantage of being able to replay IO patterns recorded using vscsistats. In my testing I used standard random and sequential read and write IO patterns of varying sizes. In each test I used 12 worker threads (one for each disk). With the exception of the 1M IO Size test where I used 1 outstanding IO, I used 8 outstanding IO’s per worker thread. I did this to limit the amount of IO queueing in the vSphere kernel due to the very small queue depth on the Fusion-io card (32 OIO’s), which wasn’t a problem though on the Micron (255 OIO’s). As a result queuing in the kernel was kept to a minimum to ensure the best possible latency results. For the Micron tests I used a 4K alignment offset to get the best performance, when using 512B alignment the performance wasn’t as good, however this wasn’t a problem with the Fusion-io cards.
Disclaimer
As with the previous tests the IO patterns I tested are not application realistic and do not demonstrate the mix of IO patterns that you would normally expect to see in a real production environment with multiple VM’s or different types sharing a datastore. The tests are for the purpose of demonstrating the limits of the different PCIe Flash cards when used at peak performance with four different fixed IO patterns. I tested 100% read 100% random, 0% read 100% random, 100% read 0% random, and 0% read 0% random. In real production workloads (shared datastore different VM types) you would not normally see this type of IO pattern, but rather a wide variety of randomness, read and write, and also different IO sizes. Except in the cases where one VM has multiple datastores or devices connected to it (big Oracle or SQL databases), in which case some of these patterns may be realistic. So consider the results at the top end of the scale and your milage may vary. Due to the technical differences between the Fusion-io based MLC cards and the Micron based SLC card this isn’t an apples for apples comparison. The tests were limited to short durations (60 seconds), so there was no opportunity to determine how the different cards perform with regard to endurance. Also not all of the cards capacity was used or overwritten during the tests.
IO Blazing Single VM Storage Performance
Random Read IOPS and Latency
Random Read Throughput
Random Write IOPS and Latency
Random Write Throughput
Sequential Read IOPS and Latency
Sequential Read Throughput
Sequential Write IOPS and Latency