6 Responses

  1. ericcsinger
    ericcsinger at |

    Hi Michael,

    It’s refreshing to see a HCI single VM benchmark. I’ve been asking for this out of vSAN, Datrium and even you guys (Nutanix) for a little while now. It’s for the very reason that you brought up. My DBA’s don’t care that an entire cluster can do 1.2 million IOPS @ <1ms latency, they care what their one big servers performance is. So kudos to you guys for taking the first step, and I say that as a person who's not really a fan of HCI (yet).

    What I would love to see you guys (and other hyper / "open" converged vendors) show, is a more realistic scenario(s). 33 disks in a single VM, even a large one, is likely not realistic. No less than the VMware benchmarks you outlined. I'm sure there are edge cases where folks have that many disks in an attempt to squeeze every last IO out (or show what a single VM can do like in your case).

    IMO, what might be more useful for folks, is showing what a single VM + a single disk performance equals. That's probably 80% of the VM's out there. Meaning, if i fire up IOMeter on a single vdisk, what kind of IO can you deliver? Maybe even show it scaling up to 33 disks if you wanted. So start with one disk, go to two, four, eight, sixteen and then thirty two. Then when you're comparing HCI solutions, we can look at which vendor provides the best single VM performance *and* aggregate VM performance.

    I also feel its important for HCI vendors to show what their resiliency settings are. Especially when it comes to write benchmarks. it's like being required to show your work in school. If one vendor can only hit "x" write IOPS with their resiliency set to one host failure, how realistic is that configuration. Versus something like three host failures, which is probably more common.

    Anyway, really great article, and thanks for demonstrating what a single VM can drive in your solution.

    Reply
    1. venkitac
      venkitac at |

      Hi Michael, I read through your blog. You still have not posted fio latency. The blog only has “cluster wide controller latency” and that is some internal nutanix thing that is not relevant to apps. Apps care about latency as seen by the app. Care to share the fio latency? Thanks.

      Reply
    2. Scott Lewis
      Scott Lewis at |

      Eric, I would acknowledge that 33 disks is not likely to be real world representative, but in all fairness, neither is one disk. High IO servers have for years been configuring with multiple virtual disks to get extra performance. Common example: SQL server with a separate boot partition, DB drive, LOG drive and TempDB drive. All on separate paravirtualized interfaces.

      The recent work Nutanix is doing in AHV has been impressive, and where queue depth has been a limiting factor in the past, I’m impressed with where they’re going.

      Reply
    3. @vcdxnz001
      @vcdxnz001 at |

      Hi Eric, Some good points there, thanks for the feedback. 1 Million IOPS is note real world or app specific, it's just a number to show a single VM can do an unrealistically high IO workload. There are just too many variables to how apps and DB's work, and no two environments are the same, so the numbers aren't directly comparable. But you hit the point of this, which is to show how a single large VM might behave, such as a DB. To get to single vDisk performance you can just divide this number by 32, as the OS disk was issuing no IO (roughly 40K IOPS per vDisk). The scalability is linear, which is one of the main benefits of a scale out platform. There are many problems with synthetic tests of any sort and IO size, pattern, randomness, IO type and other factors mean that you milage will always vary. That's why I included data from multiple tests and multiple scenarios, to give some sort of indication. Also there is plenty of data already published for single vDisk multiple VM tests and other scenarios not covered by this post. For most large databases, especially the critical high performance ones, there will be multiple vDisks. This has been a best practice since before HCI was invented. So single vDisk tests aren't as relevant. For example a large healthcare SQL Server data warehouse app would usually have 8 drives for data files, another 8 for TempDB, 1 from Transaction Log for the Data files and one Transaction Log for TempDB. So you'd end up with about 20 vDisks, assuming this is not a system with tens or hundreds of databases on the same instance (which can be the case as well). For the config DB's numbers of vDisks is less important. In the case of that healthcare system specifically the performance of a single VM on the system I had was 14GB/s for large read IO's, which simulates how the apps does reporting and 6GB/s for the ETL data load portion, this would cover a significantly large proportion of very large data warehouse and OLTP database environments. Also it should be noted that these numbers aren't maximums as the cluster I had was limited in terms of node count. We could quite easily keep scaling out the cluster to increase performance in addition to adding more VM's and spreading the workload across them.

      Reply
  2. First experiments with Nutanix AHV Turbo. – DontPokeThePolarBear

    […] vDisks required” point is also verified in Michael Webster’s post 1 Million IOPS in 1 VM – World First for HCI with Nutanix. Where he states “The VM used is configured with locally attached disk volumes, 33 in total […]

Leave a Reply