Oracle RAC is a cluster database with a shared cache architecture that provides highly scalable and available database solutions for business critical applications. Oracle RAC is a key component of Oracle enterprise grid architecture and uses Oracle Clusterware for the internode communication required in cluster database environments to enable node interaction. Clusterware is the technology that transforms multiple servers into a cluster. In November 2010, Oracle included Oracle RAC 126.96.36.199 and above in its VMware support statement (Refer to document ID #249212.1, available on MyOracleSupport.com). Under the VMware Extended Support Policy for Oracle Databases VMware Technical Support will take total ownership of any Oracle Database problems reported to them, well as providing access to a team of Oracle DBA resources, and working with Oracle support until resolution.
An Oracle RAC deployment in a VMware HA/DRS environment can fully leverage both DRS for initial placement and load balancing, and also VMware HA to enhance availability and recoverability. When configured to disable the simultaneous write protection provided by VMFS using the multi-writer flag (Refer to http://kb.vmware.com/kb/1034165) Oracle RAC nodes can be vMotioned without disruption. Using the multi-writer flag eliminates the problem of not being able to vMotion VMs when they are doing SCSI bus sharing, as you have to do with Microsoft Cluster Services Clusters.
The following information pertains to and is supported on Windows as well as Linux Guest Operating Systems. The latest VMware Oracle on vSphere Deployment Guide and also the Multi-writer KB have been updated to reflect the support of both major Guest OS’s. This is a major piece of good news for customers considering virtualizing Oracle RAC.
The benefits to Oracle RAC when deployed with vmdk’s on top of a VMFS filesystem with the multi-writer flag are as follows:
- Maintenance Mode with DRS helps avoids planned downtime – You can move Oracle RAC node virtual machines off an ESX host that requires downtime (hardware replacement/upgrade, firmware upgrade, and the like) with no loss of service.
- Simplifies server refresh cycles – Server refresh cycles can be challenging as the application and operating system typically need to be re-installed. With vMotion, moving an Oracle RAC node onto new hardware can be done in minutes, with no downtime.
- Troubleshooting – Moving Oracle RAC node virtual machines onto a different ESXi host can be an effective tool for troubleshooting suspected issues with underlying hardware.
- In the case of a dedicated VMware Cluster for Oracle (where all nodes are licensed) the RAC nodes would be free to move between ESXi hosts with DRS to enable optimal load balance.
By leveraging VMware HA a virtualized Oracle RAC environment can provide even higher availability and faster return to service than a physical Oracle RAC environment. An Oracle RAC node failure (or ESXi host failure) in a virtual deployment results in user failover that is the same as physical RAC environments, that is, user sessions fail over to the remaining nodes (assuming configuration of Oracle session failover functionality, TAF – Transparent Application Failover). When combined with VMware HA the failed Oracle RAC node can be automatically restarted on another available (and licensed) ESXi host.
A multi-node Oracle RAC deployment by design is highly available, so the use of VMware HA in this environment is not critical for protection against hardware failure. However VMware HA coexists with and complements a virtualized Oracle RAC installation in the following ways:
- While Oracle RAC maintains database availability, VMware HA can automatically restart the failed RAC virtual machine node on another ESXi host where no other Oracle RAC node exists, in the case of an ESXi host failure, in order to return to full capacity as soon as possible (as previously mentioned). As of vSphere 4.1, affinity rules are possible to enforce placement of Oracle RAC virtual machines such that they reside on separate ESXi hosts. Note that the restart of the Oracle RAC node virtual machine on a new ESXi host after a VMware HA event requires that the target ESXi host be licensed. For more information see VMware High Availability (HA): Deployment Best Practices at http://www.vmware.com/resources/techresources/10166.
- In the case of an Oracle RAC node OS failure, VMware HA VM Monitoring can detect the OS failure and automatically restart the VM.
- Disabling automatic restart of Oracle RAC virtual machine nodes if there is an ESXi host failure. In this case, impacted user sessions fail over and continue to be processed on the remaining nodes in a ―degraded state (because there are fewer nodes). This is only temporary until the failed server is repaired and brought back into the cluster, at which point the failed virtual machine node can be manually restarted.
For a cluster of virtual machines across physical hosts, anti-affinity rules should be considered to ensure that only one RAC node of a RAC cluster runs on an ESXi host. The advanced option for VMware DRS, ForceAffinePoweron = 1 can be used, which enables strict enforcement of the affinity and anti-affinity rules that are created.
For additional information on virtualizing Oracle visit my Oracle Page.
Thanks to Kannan Mani and others for all their great work on virtualizing Oracle Databases.
For further reading I recommend that you check out these documents:
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.