On 2nd July Nutanix officially released the GA of their WebScale IT Operating System – NOS 4.0.1. This marks the general availability of the 4th major revision of the Nutanix OS. There have been a few articles written already about the latest Nutanix OS 4.0.1 Release, which can be found at the nutanixindex.com, nutanixbible.com and one by VCDX56 – Magnus Andersson. But in this article I intend to dig under the covers a little and show you some of the things behind the scenes that make this such a great release, and why you should upgrade to it, or consider moving to Nutanix for this.
Working for Nutanix is like being on board a rocket ship that is just about to leave earths atmosphere. The growth of the company is amazing, and it’s only accelerating. In part this is because of the way we are disrupting traditional IT infrastructure by brining the benefits, simplicity and standardization of Web Scale IT to all scales of IT and all types of workloads. This is so organisations can concentrate more on applications and business value, and less on architecting and managing the nuts and bolts of infrastructure. It is also in part the way we do our software releases and how quickly we can bring new features to market, and improve performance as we go, all without changing the hardware underneath.
We have an amazing team of engineers and a focus on delivering an infrastructure that is always on, self-tuning, self-healing, and manageable and upgradable non-disruptively, at any scale. All of this DNA, as well as the complete company focus on great customer experiences, has been packed into the latest NOS 4.0.1.
Where is NOS 4.0 and why is the GA version 4.0.1?
Infrastructure needs to be rock solid. Just because you have software-defined infrastructure doesn’t mean it should be any less rock solid. If you’re business relies on it, it needs to work. None of the great work we do in engineering is any good if we don’t have great QA and if customers can’t have the confidence to adopt the new technology as fast as we can release it. We go to great lengths to QA things so you can have confidence to run them in production when they become available, and start to see immediate value. This is one of the reasons the GA version of our 4th release is 4.0.1 and not just 4.0 (It’s not just Marketecture).
Nutanix released 4.0 as an early adopter (EA) release to a group of customers and our partners to use after we had finished our own intensive internal QA and were happy it was fit for production use. Before the EA release we had a series of internal release candidate (RC) releases (on top of betas and nightly master builds), which are applied to a large number of internal environments, and heavily QA’d by our test team, and I used these RC’s to test with Oracle RAC under extreme load conditions (All Nutanix VCDX’s are involved in assuring product quality, more VCDX’s than any other company working on a storage product). This is all in addition to our nightly automated functionality and performance testing, which also allows us to catch defects early. The net result is that every release goes through hundreds of person years of testing and we deliver higher quality as a result. You should take a look at the release notes for NOS 4.0.1 and see what changed between the versions.
QA is an important part of ensuring great customer experiences and of ensuring customers have confidence to adopt technology and get value from it sooner. This helps improve ROI and TCO, and is one of the ways that Web Scale and Cloud environments work to make customers more productive. You get the benefit if you’re running Nutanix in your own software-defined datacenter or private cloud, or if you’re hosting with a Nutanix Powered Service Provider. We invest heavily in QA so you can get the benefits of more reliable and functional software and not have to do as much testing yourselves. I think this sets Nutanix apart, as many software companies will release GA code before it’s ready, often just expecting the customers to do a large part of the QA, and are then slow to respond to bugs. I also like that Nutanix is very paranoid about data integrity, as all storage related companies should be. All software has bugs, there is no such thing as defect free software, Nutanix just takes the war on bugs and defects to a new level, so customers get the quality they expect and deserve and benefit from constant never ending improvement.
Same Hardware, Better Performance
One of the things Nutanix tries to do with each release is improve on performance. This means you get more performance from exactly the same hardware platform, and results in better value. This is one of the benefits of software-defined storage and Web Scale IT. All the testing and research into performance that we do gets backed into the product. For the 4.0 and 4.0.1 release we made some changes to our network stack, and also how we handle reads and writes. The net result being up to 50% improvement in random write performance, lower overall CPU utilization, and improved random read performance. One of the ways we achieved this is by striping our OpLog, which receives and buffers all writes on SSD’s, across all SSD’s in the Nutanix system. The image below is what it looks like.
The Persistent Write Buffer is the OpLog. All data is synchronously stored on 2 or 3 nodes (with availability domain awareness), in SSD, before being acknowledged. Writes are guaranteed to be persistent, even if the power fails. By using all the SSD’s available on the nodes and effectively stripping the OpLog across all SSD’s we were able to greatly increase random write performance. The OpLog is designed to take in random writes fast and later asynchronously drain them off to the Extent Store, in SSD first, and then HDD if the data becomes cold. This process just got more efficient.
In addition to changes to the OpLog we made changes to how we read and write IO’s by using oDirect and Async IO more. This allows less threads to get more IO work done, saving CPU resources, getting more work done for less CPU consumption. This change improves the performance of systems that have even a single SSD, whereas they would not benefit from the Multiple OpLog feature.
The enhancements made to the network stack help all of this work better, but also help to not only improve performance, but reduce latency. You should notice improved latency compared to previous releases. All of these changes are made to improve performance from small scale (three nodes) up to whatever scale you want, and across all the node types that Nutanix supports.
Even though performance was good, we’re still not happy. We have a lot of investment in performance research and are always looking for ways to improve performance for our customers on the same hardware that they’ve invested in. Our performance research will continue, and it’s not just down in the depths of storage IO performance, but also application performance. The team at Nutanix that I’m part of is responsible for solutions and performance engineering. So we research not just deep storage IO performance, but application performance, such as with VDI workloads (Citrix and View), Business Critical Apps, such as Exchange, SQL Server, Oracle, SAP and Enterprise Java, and work at enhancing the whole platform from top to bottom for all the different types of workloads that will run on it. You can be sure you’ll see a lot more from us in terms of best practices for these apps, as well as improvements in overall platform storage performance. All this is part of a drive for constant never ending improvement, and great customer experiences.
Nutanix puts the hard work in so that customers and partners can wear the Nutanix grin. We work hard to take complex things and make them very simple for our customers, while the environment is always on. Web Scale IT Infrastructure is for everyone, Nutanix just had to take all the technology and the lessons and make it simple and easy to consume. I hope you’ve enjoyed this brief look under the covers of some of the aspects of the 4.0.1 release, for what is a mature and robust Web Scale IT Virtual Computing platform for the masses. As always your comments and feedback is welcome.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – 2014 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.