There is a lot of FUD about Data Corruption, Torn IO, Write Ordering and other aspects when using NFS as a datastore in VMware vSphere, even when the VM’s are configured to use Virtual Disks. This seems surprising, especially given some very large VMware vSphere based clouds are built on NFS storage presented as datastores for use with VM’s, and that for years numerous companies have been running business critical apps on NFS, presented as datastores, or otherwise. Many of you may not know that VMware has actually patented the process for presenting NFS as a datastore to VM’s that use Virtual SCSI disks (US7865663), so that it emulates the SCSI protocol. You also may not know that not all storage systems, even when using block based storage such as FC, FCoE or iSCSI, honour all of the techniques to keep your data safe. A lot of it comes down to individual storage system implementation. Enterprise storage systems that take data protection seriously and implement the appropriate IO protections are all suitable for running business critical apps, even when presenting NFS for use as a datastore to VMware vSphere. So what do you need to know?
Fortunately in this fight against the FUD, Josh Odgers (www.joshodgers.com) has done all the hard work and research and written some excellent articles on this topic. He’s also working with VMware to have knowledge base articles either written or updated. But here are the highlights and then links to all of Josh’s articles.
Fight the FUD: NFS as a Datastore with VMware vSphere
1. Emulation of the SCSI Protocol
When using NFS as a datastore VMware emulates the SCSI protocol including resets and aborts. This is a complete abstraction and the virtual machines using virtual SCSI disks have no knowledge of the underlying storage. The protocol retains it’s IO integrity, regardless of how the underlying storage is presented. This is a patented and thoroughly tested process.
2. Forced Unit Access and Write Through
Forced Unit Access and Write Through ensures that IO’s are written to persistent media and that when a write is acknowledged it is actually written and protected. This is honoured in the VMware vSphere hypervisor, but is also the responsibility of underlying storage systems. Not all storage systems honour this requirement completely. Data loss can result if the IO’s are not persistently written. This can be a problem for any storage system, regardless if it is presenting block storage or NFS. Every system is different and implements data protection in different ways. If you are running business critical apps on your storage systems then you should ask your vendor how they meet this requirement and protect your data.
3. Write Ordering
Writes must be written in the order they are sent. Otherwise you can end up with data corruption where data is read that wasn’t written, or overwritten because of a write out of order. VMware specifically covers write ordering in KB 1012143 and says ” Write ordering and write-through integrity for NFS storage are both satisfied with NFS in an VMware ESX environment. An NFS datastore, when mounted on an ESX host, goes through virtual SCSI emulation. A virtual machine disk (VMDK) file on an NFS datastore appears as a SCSI disk within the virtual machine’s guest operating system, which is no different than one residing on a VMFS volume over FCP or iSCSI protocol. Therefore, write ordering and write-through integrity are no different than those with block based storage (such as iSCSI or FC protocol).”
4. Torn Writes
Torn writes are where only part of a multi-sector update are written successfully to disk. If there is a problem during a multi-sector write operation, and the write is acknowledged, it can result in silent data corruption. Storage systems implement torn write protection in different ways. The important thing to note is that this problem is related to the underlying storage system implementation and not the storage protocol or the hypervisor. Therefore there is as much risk on block based storage as there is on NFS and it comes down to your vendors data protection implementation. How torn writes are protected against is something you should know for your storage system if you are the virtualization admin or the storage admin. Not all storage systems are created equal when it comes to data protection and IO integrity with regard to torn writes. But this is not a problem caused by VMware vSphere, or using NFS as the datastore.
5. Data Corruption
Data corruption can occur for a number of reasons and depends on the underlying storage system implementation. Corruption handling mechanisms in your storage systems are essential. Corruption handling via sector and ideally block based checksums is essential on writes. Using a checksum on Read helps detect corrupted data. Backup / Recovery solutions are essential. Enterprise grade storage solutions which use checksums to verify data integrity on write and reads have a much lower risk of data corruption, regardless of media type and storage protocol. JBOD style deployments using SATA drives have a significantly higher risk of data corruption. See Josh’s article below for all of the data based on a study of over 1million disks in production.
Josh Odgers Fights the FUD of VMDK’s on NFS Datastores:
Part 1 – Emulation of the SCSI Protocol
Part 2 – Forced Unit Access (FUA) & Write Through
Part 3 – Write Ordering
Part 4 – Torn Writes
Part 5 – Data Corruption
Final Word
This article focused on important storage requirements that protect the integrity of IO and your data when it’s in flight and at rest. This was with regards to Virtual Machines that are using SCSI Virtual Disks where the underlying storage is presented to the VMware vSphere hypervisor as NFS, not where the Virtual Machine directly accesses an NFS system. Many of the requirements come down to individual vendor implementation of their storage systems, regardless of the protocol being used. VMware operates a storage system certification process for a reason, to reduce the risk to you, and it ensures that NFS is properly implemented for systems that use it with VMware vSphere. You can have confidence that using NFS as a datastore with VMware vSphere is suitable for even business critical applications such as Oracle, SQL Server and Exchange. If you’re a Nutanix customer then Nutanix will fully support you when running any application where NFS is presented to VMware vSphere as the datastore, and this includes Exchange.
—
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – 2014 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
[…] the FUD: Data Corruption, Torn IO, Write Ordering and More Michael has written a nice article on NFS as a way to help people understand there really is a bit of FUD out there that is not […]
[…] Fight the FUD: Data Corruption, Torn IO, Write Ordering and More by Michael Webster […]