The number of enquiries I’ve been receiving regarding Microsoft Failover Clustering, especially for Microsoft SQL Server Databases has skyrocketed in the past few weeks. I have been receiving a number of enquiries from customers and also from partners including cloud service providers. As a result I thought I’d write this article to help you understand what the current status is of support for Microsoft Failover Clustering on VMware vSphere 5.1 (GA) and with regard to some VMware products.
Firstly there are two main VMware knowledge base article that outline the support statements of Microsoft Failover Clustering and Microsoft Cluster Services on VMware vSphere. They are as follows:
This article only applies to vSphere 5.1. The rule book has been rewritten with vSphere 5.5, check out my article on vSphere 5.5 Windows Failover Clustering Support.
Clustering and VMware Solutions
In addition to the above there are specific mention of clustering configurations for the VMware technologies that support it, such as for the vCloud Director SQL Database, which was introduced in vCD 5.1 and covered in my article Clustering Support on vCloud Director and vCenter Databases. The golden rule is this. If VMware does not specifically document a clustering solution as being supported then it is NOT supported. vCenter Server from version 4.0 to current 5.1 GA does not support a clustered Database, be it Oracle RAC or SQL Server. It has not been tested by VMware and is therefore not supported. This may well change in the future as VMware recognises the need to provide alternative high availability solutions for the vCenter Database and I will update this article accordingly. However currently the supported high availability solutions for the vCenter and its database are VMware HA, and vCenter Server Heartbeat. Clustering of the vCenter Server itself is also not supported by VMware but is covered by KB article 1024051 – Supported vCenter Server high availability options.
Customers with production support who wish to run Oracle RAC for the DB for vCenter (not SSO, as that doesn’t work) can get support from the VMware Oracle Support Team under VMware’s Expanded Oracle Support Policy. But they will be limited by the capabilities of vCenter itself, if any. I do know a number of customers running vCenter DB (Not SSO) on Oracle RAC in an active/passive service configuration and it has been fine for years. Also I expect the official support statement to change in the future as the testing for vCenter and RAC is completed.
Not supported does not always mean something doesn’t work. But it does mean it hasn’t been tested by VMware and therefore VMware can’t stand behind the configuration as a supported solution. If it’s not documented as supported, then it’s not supported.
The Status of Microsoft Failover Clustering Support on VMware vSphere 5.1
VMware has done a lot of work to enhance support for Microsoft Failover Clustering and its predecessor Microsoft Cluster Services on VMware vSphere 5.1 to support larger cluster sizes. You can now support up to 5 nodes in a virtual Microsoft Failover Cluster on vSphere 5.1. This is great news for environments where two nodes was not enough, even when combined with the additional availability of VMware HA. I’ve implemented a number of solutions where Microsoft Failover Clustering was used successfully in the cases where it was justified and within the limits that were supported. Strong justification and support constraints are two things I’d like you to think about as you read further.
You can still do hybrid Physical and Virtual clusters, and you can also still do cluster-in-a-box with VMDK’s (dev / test of cluster functionality itself not for high availability). VMware Site Recovery Manager is also supported to protect Microsoft Failover Clusters from a DR perspective and there are a number of different configurations you can use such as multi-node to single node, or multi-node to multi-node. This really does make DR for the cluster easy, less error prone, and of the recovery plan itself once it is initiated is automated and provides audit reporting. VMware HA is fully supported, however VMware recommends you implement anti-affinity rules to ensure cluster nodes are prevented from start up on the same physical host.
So what are the gotcha’s or caveats I hear you ask? Well there are a few gaps in support that you should be aware of when developing your solution architecture. I’ll also cover some of the other valid options you have for high availability later as well and some of the impacts of using Microsoft Failover Clustering. This list is in no particular order.
- Clustering Across Boxes (i.e. traditional clustering for high availability purposes) is not supported with the use of VMDK’s or Virtual Mode RDM (vRDM). You must use Physical Mode RDM’s (pRDM) due to the requirement of persistent SCSI reservations.
- Due to the requirement to use pRDM’s there is no support for doing backups with vSphere API’s for Data Protection (vADP). So you must use in guest agents for backup.
- The is no support for vMotion or DRS with Microsoft Failover Clusters as they use shared disks and a shared SCSI bus. Any attempt to migrate a cluster node will be met with an error message. This doesn’t mean you can’t deploy a Microsoft Failover Cluster inside a VMware DRS cluster, because you can and it’s fine, it just means that DRS can’t automatically migrate the Microsoft Failover Cluster nodes automatically because vMotion isn’t supported.
- Windows Server 2012 Failover Clustering is not supported currently.
Period. Not even with in-guest iSCSI.[Updated 21/06/2013] Except with non-shared disk access, in-guest iSCSI, or in-guest SMB storage access. MS SQL Server 2012 on top of Windows Server 2012 with AlwaysOn Availability Groups is supported as it does not require shared disk. See the VMware KB