Oracle RAC is a cluster database with a shared cache architecture that provides highly scalable and available database solutions for business critical applications. Oracle RAC is a key component of Oracle enterprise grid architecture and uses Oracle Clusterware for the internode communication required in cluster database environments to enable node interaction. Clusterware is the technology that transforms multiple servers into a cluster. In November 2010, Oracle included Oracle RAC 11.2.0.2 and above in its VMware support statement (Refer to document ID #249212.1, available on MyOracleSupport.com). Under the VMware Extended Support Policy for Oracle Databases VMware Technical Support will take total ownership of any Oracle Database problems reported to them, well as providing access to a team of Oracle DBA resources, and working with Oracle support until resolution.
An Oracle RAC deployment in a VMware HA/DRS environment can fully leverage both DRS for initial placement and load balancing, and also VMware HA to enhance availability and recoverability. When configured to disable the simultaneous write protection provided by VMFS using the multi-writer flag (Refer to http://kb.vmware.com/kb/1034165) Oracle RAC nodes can be vMotioned without disruption. Using the multi-writer flag eliminates the problem of not being able to vMotion VMs when they are doing SCSI bus sharing, as you have to do with Microsoft Cluster Services Clusters.
The following information pertains to and is supported on Windows as well as Linux Guest Operating Systems. The latest VMware Oracle on vSphere Deployment Guide and also the Multi-writer KB have been updated to reflect the support of both major Guest OS’s. This is a major piece of good news for customers considering virtualizing Oracle RAC.
The benefits to Oracle RAC when deployed with vmdk’s on top of a VMFS filesystem with the multi-writer flag are as follows:
- Maintenance Mode with DRS helps avoids planned downtime – You can move Oracle RAC node virtual machines off an ESX host that requires downtime (hardware replacement/upgrade, firmware upgrade, and the like) with no loss of service.
- Simplifies server refresh cycles – Server refresh cycles can be challenging as the application and operating system typically need to be re-installed. With vMotion, moving an Oracle RAC node onto new hardware can be done in minutes, with no downtime.
- Troubleshooting – Moving Oracle RAC node virtual machines onto a different ESXi host can be an effective tool for troubleshooting suspected issues with underlying hardware.
- In the case of a dedicated VMware Cluster for Oracle (where all nodes are licensed) the RAC nodes would be free to move between ESXi hosts with DRS to enable optimal load balance.
By leveraging VMware HA a virtualized Oracle RAC environment can provide even higher availability and faster return to service than a physical Oracle RAC environment. An Oracle RAC node failure (or ESXi host failure) in a virtual deployment results in user failover that is the same as physical RAC environments, that is, user sessions fail over to the remaining nodes (assuming configuration of Oracle session failover functionality, TAF – Transparent Application Failover). When combined with VMware HA the failed Oracle RAC node can be automatically restarted on another available (and licensed) ESXi host.
A multi-node Oracle RAC deployment by design is highly available, so the use of VMware HA in this environment is not critical for protection against hardware failure. However VMware HA coexists with and complements a virtualized Oracle RAC installation in the following ways:
- While Oracle RAC maintains database availability, VMware HA can automatically restart the failed RAC virtual machine node on another ESXi host where no other Oracle RAC node exists, in the case of an ESXi host failure, in order to return to full capacity as soon as possible (as previously mentioned). As of vSphere 4.1, affinity rules are possible to enforce placement of Oracle RAC virtual machines such that they reside on separate ESXi hosts. Note that the restart of the Oracle RAC node virtual machine on a new ESXi host after a VMware HA event requires that the target ESXi host be licensed. For more information see VMware High Availability (HA): Deployment Best Practices at http://www.vmware.com/resources/techresources/10166.
- In the case of an Oracle RAC node OS failure, VMware HA VM Monitoring can detect the OS failure and automatically restart the VM.
- Disabling automatic restart of Oracle RAC virtual machine nodes if there is an ESXi host failure. In this case, impacted user sessions fail over and continue to be processed on the remaining nodes in a ―degraded state (because there are fewer nodes). This is only temporary until the failed server is repaired and brought back into the cluster, at which point the failed virtual machine node can be manually restarted.
For a cluster of virtual machines across physical hosts, anti-affinity rules should be considered to ensure that only one RAC node of a RAC cluster runs on an ESXi host. The advanced option for VMware DRS, ForceAffinePoweron = 1 can be used, which enables strict enforcement of the affinity and anti-affinity rules that are created.
For additional information on virtualizing Oracle visit my Oracle Page.
Thanks to Kannan Mani and others for all their great work on virtualizing Oracle Databases.
For further reading I recommend that you check out these documents:
Oracle Databases on VMware vSphere Part 1 – By Kannan Mani
Oracle Databases on VMware vSphere Part 2 – Oracle RAC on VMFS with ASM – By Kannan Mani
Oracle Databases on VMware vSphere Part 3 by Kannan Mani – Oracle RAC Node vMotion
Episode I: GMP Takes on Oracle Apps and RAC Database Virtualization
Episode II: GMP Completes Oracle Apps and RAC Database Virtualization
Oracle Databases on VMware – Best Practices Guide
Deployment of Oracle Databases on VMware Infrastructure
Oracle Databases on VMware – Workload Characterization Study White Paper
Maximizing Oracle Database Performance and Minimizing Licensing Costs in Virtualized Environments
Virtualizing Performance-Critical Database Applications in VMware® vSphere™
Oracle RAC Performance on VMware vSphere 4.1
Oracle Database Virtualisation Resources at VMware.com
—
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
[…] Zero downtime maintenance by vMotion migrating live databases without disruption (Including Oracle RAC see previous article) […]
Reading through this blog and other documents such as http://www.vmware.com/files/pdf/partners/oracle/v… you'll find the OS mentioned is Linux. My open question to anyone; will a virtualized Oracle RAC on Windows 2008 using the multi-writer method work in a production environment?
Hi Todd, I've done a bit of testing with a two node 11g R2 RAC environment running on Windows 2008 R2. It appears to work the same as Linux when using the multi-writer flag correctly. However if you're planning to implement on vSphere 5.0 it would pay not to use the VMXNET3 NIC at this stage as there are a few bugs that may impact stability and connectivity if using large pages when using it on Windows (which you might normally implement for the interconnect private network). VMXNET2 or E1000 still work fine, although no large pages with E1000. It's only a problem with updated vSphere 5 version of VMware Tools. KB 2006277 explains the situation. It's expected to be fixed in 5.0 U1. Otherwise happy testing.
[…] customer was very interested in receiving all the benefits I had outlined in my post on Oracle RAC in an HA/DRS Environment. They were particularly interested in resource controls, Guest OS isolation, rapid provisioning and […]
[…] Oracle RAC in a HA/DRS Environment […]
Hi Todd, I wanted to follow up on this. I can confirm now for certain that Oracle RAC on Windows using the multi-writer flag is fully supported and workable. There is a great blog post by one of the VMware guys Bob Goldsand that mentions this. You can read his blog at http://blogs.vmware.com/alliances/2011/12/episode….
The initial stability issues with VMXNET3 on vSphere 5 have been fixed in vSphere 5.0 Update 1.