Those that are familiar with VMware technology will know vMotion well, and how reliably it works. VMware has worked hard on this for many years. But when talking to customers running traditional Unix systems for their Oracle databases, especially RAC, especially when under high load, and when the system is a monster, sometimes they are sceptical. To alleviate any concerns VMware teamed up with Cisco, EMC and Principled Technologies to produce a white paper demonstrating the vMotion of three highly utilized RAC Nodes doing thousands of transactions per second without any client disruption. This article to very briefly discuss the test and give you a link to the white paper so you can download it and read it for yourself.
You start with a EMC VMAX Cloud Edition (16 x 2TB SATA, 106 x 300GB FC, 24 x 100GB SSD), UCS B200 M3 Blades (2 x E5-2680 and 384GB RAM), and Cisco Nexus Switches. Then add VMware vSphere 5.1, and 3 x Oracle RAC 11g R2 (11.2.0.3) Nodes, each with 16 vCPU’s and 156GB RAM, you end up with a whole lot of fun. Then you add in Benchmark Factory and a few hundred gig’s of data and you’ve got a good platform to start some high performance Multi-NIC vMotion tests.
The tests showed that even when generating 3654 TPS during the vMotion live migration operations between the 3 x VMware vSphere 5.1 Hosts client connections remained established and able to process transactions. The hosts were subjected to up to 70% CPU utilization during the tests, and the two x 10G vMotion NIC’s were used to ensure the migrations could be completed in a reasonable time period. Even under the most extreme three way vMotion test, which took only 180 seconds to complete, performance returned to pre-migration levels within 350 seconds. Let me ask you this, how long does it take you to live migrate a physical RAC node? Trick question I know. No downtime, non-disruptive to client connections, allowing you to do maintenance and upgrades while still providing 100% RAC node availability, and much more rapid response to failure scenarios than could ever be achieved with a physical RAC environment.
Oh did I mention it’s much simpler to run Oracle RAC on vSphere? Well you don’t have to worry about storage multipathing or network teaming messing up your storage or RAC client and interconnect networks. This is all handled seamlessly by the hypervisor. This eliminates two areas that commonly cause configuration and stability problems with physical RAC environments. This also proves that provided you design your converged network infrastructure correctly you don’t need dedicated switches for the RAC interconnect network. What you do need is sufficient bandwidth and QoS. Jumbo Frames was also used in these tests.
Final Word
You might not vMotion Oracle RAC nodes of this size or bigger every day, you might just do it for non-disruptive preventative maintenance on a server every once in a while, or to do a non-disruptive upgrade to a new server. But knowing you can do it and it won’t impact your applications and your users is critical. Especially when we’re talking about business critical applications that could have a large risk and cost associated with them. I’m sure VMware will release more such tests to demonstrate the capability of the platform and I know that similar tests will be done on vSphere 5.5, which wasn’t available at the time these tests were being run. Hopefully this will give you some confidence to start virtualizing some of your Oracle RAC systems.
I would encourage you all the read the full report – Principled Technologies – vMotion Oracle RAC.
—
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.
Hi Michael,
Great post and this proves once again the absolute stability and reliability of vSphere under massive load, and really, the ability of vSphere to confidently virtualize any workload. Once virtualized you can use the tools on the platform to provide benefits, as you would for a 1vCPU VM. And that is the power of vSphere. It's not how big is your spanner….it's whether it works or not.
Paul Meehan