During VMworld USA 2013 where vSphere 5.5 was launched we heard all about the new enhancements. Some of them were less publicised than others. This article will fill you in on another great reason to consider moving to vSphere 5.5 when it is released. vSphere 5.5 brings with it huge enhancements to the support of Windows Failover Clustering (WFC) previously known as Microsoft Cluster Services (MSCS). This by itself could be a major reason customers choose vSphere 5.5 over previous releases. You may recall that clustering support in vSphere 5.1 was quite a complex matrix to consider, and I tried to explain the various options in my article The Status of Microsoft Failover Clustering Support on VMware vSphere 5.1, which was followed shortly thereafter by Windows Server 2012 Failover Clustering Now Supported By VMware With Some Caveats after the VMware KB (KB 1037959 Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations) was updated. The release of vSphere 5.5 once again rewrites the rulebook for Microsoft Failover Clustering. So lets dive into it a bit and see what’s changed.
vSphere 5.5 introduces full support for the following:
- Windows 2012 Failover Clustering without having to use in-guest storage access!
- FC protocol for Windows 2012 Failover Clustering Shared Disks from the ESXi Host for the Cluster pRDM’s
- FCoE protocol from the ESXi Host for the Cluster pRDM’s
- iSCSI protocol from the ESXi Host for the Cluster pRDM’s
- Round Robin Storage Native Multi-pathing Policy
Clustering support still remains with up to 5 nodes per Windows Failover Cluster, but this isn’t really much of a limitation when you can run as many Windows Failover Clusters as you like on top of a VMware vSphere Cluster, or on top of multiple VMware vSphere Clusters. Provided of course you don’t exceed the 255 SCSI devices per host limit. You may also need to set the perennially reserved flag for the RDM’s once you a reasonable number of RDM’s to ensure your hosts boot up as fast as possible. This is covered in VMware KB 1016106 – ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to boot or during LUN rescan.
There are a few things that would be nice to have in VMware vSphere for Windows Failover Clusters and I hope these things are included in future releases:
- Cluster awareness and location awareness within a VMware vSphere Cluster, so that operations make sense from a Clustered VM perspective
- Support for Shared VMDK’s, rather than having to use pass-through or physical mode RDM’s
- Support for vMotion and VMware DRS of Windows Failover Cluster Nodes
- Support for vADP style backups
- Perhaps support for higher number of Windows Failover Cluster Nodes instead of limiting it to 5, although as I describe above this really isn’t much of a limitation
I haven’t mentioned VMware HA above because that is already supported now and has been for a long time. There were always many good reasons to virtualize Microsoft Failover Cluster systems, including increasing availability and management of the systems, now these benefits have been enhanced to include more options. I’m sure they’ll continue to be enhanced into the future. Just a quick note regarding the new Virtual Hardware v10 in vSphere 5.5. This includes some performance enhancements for Windows VM’s and would definitely be the best option when choosing to virtualize Windows Failover Clusters. I’ll cover Virtual Hardware v10 in more detail in a future article.
On a different note, I often get asked about running Windows Failover Clusters on top of a stretched storage solution such as EMC VPLEX, IBM SVC, HP Peer Persistence, or NetApp Metro Clusters, or just running Windows Failover Geo Clusters. In the case of these configurations VMware relies on Microsoft and the storage vendors support statements, and also on having a stretched network environment underpinning the overall solution. You would need to carefully consider how many nodes would be supported at each site, and if the overall complexity of the solution was justified. I would also strongly recommend an in depth testing and validation process that covers all conceivable failure scenarios. You would also need to have a test instance of the software and infrastructure in order to achieve the high availability you are seeking. On top of this you’d need to consider DR. Geo Clusters are a high availability solution, they are not a DR solution. You have to plan for things like corruption of the cluster and how that would be recovered. However there is no reason you couldn’t operate a stretched Windows Failover Geo Cluster virtualized, and you would achieve additional availability and manageability benefits for doing so.
One of the features introduced in vSphere 5.0 Update 1 was support for a new type of storage behaviour called a Permanent Device Loss or PDL. This is a state common in Stretched Metro Cluster environments (vMSC) where a device becomes unavailable at one site, or where a device is administratively removed and will not be coming back. PDL handles the SCSI sense codes that are sent back from the storage arrays and vSphere then stops sending IO’s to the failed devices. Use of PDL is described in Duncan Epping’s article vSphere Metro Storage Cluster solutions and PDL’s. In vSphere 5.5 the PDL behaviour has been further enhanced.
vSphere 5.5 introduces PDL AutoRemove, which automatically removes a device in a PDL state from a host. A PDL state on a device implies it cannot accept more IOs, but needlessly uses up one of the 256 device per host limit. Now the PDL devices will be automatically removed, which frees up the number of devices per host, given that the devices in PDL state are not coming back. In the case of a vMSC environment the devices may eventually come back at some point in the future and you can simply rescan to pick them up.
Finally before I end this article I would like to mention that vCenter 5.5 now includes support for the SQL Server DB being hosted on a Windows Failover Cluster, however SQL Server 2012 AlwaysOn Availability Groups aren’t supported for the vCenter Server Database. vCenter 5.5 also supports Oracle RAC.
In addition to the new Jumbo VMDK support in vSphere 5.5 the enhancements to Windows Failover Clustering support provide even more reason to virtualize your business critical applications with confidence. You can be assured that performance has been greatly enhanced and there will be more articles specifically on performance to come. If you’d like to review more vSphere 5.5 features I recommend checking out the What’s New in vSphere 5.5. Platform-Quick Reference that Alan Renouf has been kind enough to put together.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.