If you are a Zerto customer they have released a very critical patch that fixes an IO path bug where SCSI sense codes were modified before responses were sent back to guest VM’s. Un-patched, the bug could result in data integrity issues. This has been seen in the field, particularly with Microsoft SQL and Oracle databases where the storage is under constant load and where a storage controller upgrade was being performed online. So it’s very imperative that you review the Zerto release notes and upgrade to Zerto 5.0 U2 ASAP.
The underlying problem was related to SCSI sense codes that the storage system was sending to the guest OS during certain storage controller failover operations. The bug prevented proper processing of the SCSI sense code (the code was being modified and truncated by Zerto software), which erroneously resulted in the OS thinking a write operation had been completed when in fact it had not. This issue was seen on Nutanix systems due to the frequency that storage controllers (CVM’s) are updated. The Nutanix business critical apps team was able to reproduce the issue and worked with Zerto to get the problem reproduced and corrected. We have confirmed Zerto 5.0 U2 has resolved the issue. Zerto is a great partner, and they were extremely helpful in resolving this issue and coordinating with their impacted customers.
I’ve personally been involved in a number of very successful large scale (many PB of data) projects where Zerto has been used for migration and DR and have not experienced this issue. I’ve also been involved in helping recover from this issue for a number of impacted customers. It’s good to know that when found this issue was fixed and assistance provided to impacted customers jointly by Zerto and Nutanix.
Derek Seaman was first to write about this issue being resolved here.
Here is an image of the Zerto Release Notes. Although it mentioned Nutanix in the notes any storage system could potentially be at risk due to the way the bug was caused. The full release notes for Zerto Replication 5.0 U2 are found here.
Hey Michael. We are on version 6 of Xerto and seem to be experiencing a similar issue. We just finished a datacenter migration involving RDM’s. So far it appears that every RDM that was migrated has corrupted databases on it. Xerto, VMware, and Microsoft haven’t seen any issues when troubleshooting. Have you seen any issue when migrating with RDM’s? Also, what should we look for in the logs to find the evidence needed to prove this is the issue. I would appreciate any help with this.
Hi Matt, haven’t seen this issue myself but look for scsi sense codes that may be shorter than they should be. You’d have to look at the VM logs and host logs. It’s not easy to find issues like this.