When you are looking at a new project and want to validate the performance of a database platform you intend to choose, it is wise to set a baseline for the source system that can then be compared to the destination (under similar conditions). There are various ways to set a baseline that can then be used for further comparison. Ideally if you are doing a migration between one database system and another you can record real transactions and replay them on the new system which gives the most valid comparison. However when this is not possible you need to look at a standardized benchmark. We will discuss in this article is using the Swingbench Order Entry Schema to evaluate performance. The data presented is based on virtualized Oracle 12c on Oracle Linux 6.6 and ESXi 5.5 Hypervisor.
Swingbench is a free load testing tool developed by Dominic Giles. Swingbench includes four benchmarks, OrderEntry, SalesHistory, CallingCircle and StressTest. The following is the description of the Swingbench Order Entry schema from Dominic’s web site:
- OrderEntry is based on the “oe” schema that ships with Oracle11g/Oracle12c. It has been modified so that Spatial, Intermedia schema’s do not need to be installed. It can be run continuously (that is until you run out of space). It introduces heavy contention on a small number of tables and is designed to stress interconnects and memory. It is installed using the “oewizard” located in the bin directory. Both a pure jdbc and pl/sql (lower network overhead) variant exist of the benchmark.
Note that this benchmark is designed specifically to test and stress both interconnects and memory. Also because it is an OLTP benchmark and highly CPU intensive. So it will stress CPU, Memory and Interconnects between RAC nodes if testing in a RAC environment. What it doesn’t do, and this is important to understand, it does not massively stress storage bandwidth or IOPS. Other benchmarks are more designed to stress back end storage (Such as Silly Little Oracle Benchmark – SLOB which was used for my article All Flash Performance on Web Scale Infrastructure). But these factors make it deal to test in a modern converged infrastructure and hyper-converged infrastructure environment as it will stress multiple system components at the same time.
Preparing the OE Schema
The preparation of the Order Entry schema is very CPU intensive on the client. Make sure you have enough vCPU’s to generate the schema in a reasonable period of time (the more vCPU’s the better). To prepare the OE Schema you use the oewizard. Supply your credentials, select the database parameters including username and data file locations, whether to use compression or partitioning (depending on licensed features), and then select a standard size schema. Unlike older versions of Swingbench the new version comes with standard sizes, no longer do you have to select the number of users and catalog items. I usually test with 100GB or 1TB. If you are using Oracle RAC as your system under test you can either connect to the service on an individual node or connect to the Single Client Access Name (SCAN) service. The level of parallelism chosen for the OE build will be 2x the number of vCPU’s of the client. If you have > 16 vCPU assigned to the client load generator system I would recommend configuring the level of parallelism to be 4x the number of vCPU’s to make sure they are kept busy at 100% utilization during the schema build.
Running a Test
You execute a test using Swingbench (.bat on Windows) in the bin directory. To achieve the highest transaction rate you can modify the number of users and the think time per transaction. To get the maximum transaction rate think time can be set to zero. At that point modifying the number of users may only serve to increase latency depending on if the database is already at saturation point. For testing stand alone databases you can set up the database VM and modify the number of virtual CPU’s and RAM across tests to see how the system scale up. How many transactions per second are achieved and at what latency per transaction on average. This is measured through the Swingbench interface and from the view point of the application, not the storage. Here is an example configuration from Swingbench I have used to successfully test large database VM’s.
This load configuration will use 120 users with no think time at all. The test will run for 1 hour and collect statistics from the database for 30 minutes. This allows you to analyze what was happening on the database during that time. You can also monitor statistics such as IO, CPU from the guest OS where the database is running. The connection string to the database is the same as when you were creating the OE Schema. See the Oracle Documentation for how to connect to an Oracle database using either the JDBC or ODI drivers.
Here are some example images taken during Swingbench tests in a virtualized environment. Note the transaction throughput is measured in terms of transactions per second and transactions per minute. The individual transaction latency is also measured and an average for all transactions is displayed. The latency is at the application layer and would be the end user response time for an interactive system. For each test the database (Oracle 12c) was configured with 192GB RAM and an SGA of 128GB, the number of virtual CPU’s are modified between each test run.
Single VM 18 vCPU:
Single VM 24 vCPU:
Single VM 48 vCPU:
As you can see from the above graphs the performance scales almost linearly as the number of virtual CPU’s on the VM is increased. At 48 vCPU’s the performance is approx 88% of linearly scalability. With 48 vCPU’s the single VM was able to reliably execute 20K TPS at 5ms application latency.
The above tests were run with 120 virtual users in Swingbench and zero think time. Higher numbers of virtual users could have been used and transaction throughput would not have been significantly different, however transaction latency would have increased. This is due to the number of virtual users that would have to be coordinated. During the test execution the client VM running Swingbench was using approx 2% – 6% CPU, so was very lightly loaded.
The actual performance of the system under test will be almost entirely dependent on the physical hardware platform that it runs on. There is very little if any overhead from the hypervisor used, which in this case is ESXi 5.5. During this test, as the test doesn’t particularly stress the storage layer, it was observed between 12K and 24K IOPS at sub ms latency, storage throughput was between 200MB/s and 400MB/s on average. The hardware used for the testing is a combination of Nutanix 2 x NX4170 quad socket nodes and 4 x NX9040 All Flash nodes. The results show you can achieve excellent scale up performance even on web scale or hyper-converged architecture with Nutanix.
Swingbench Testing with Oracle RAC
There are two main ways to use Swingbench to test against an Oracle RAC database. A single client via the SCAN service to distribute client connects to each RAC node, or by using multiple load clients and the Swingbench coordinator and clusteroverview, each connecting to a single RAC node. Because of the way Swingbench works when testing with Oracle RAC your cluster interconnect will be a bottleneck and limit transaction scalability if you use the SCAN service to connect. To get the best throughput it is recommended that you configure multiple SOE data files and multiple SOE schemas, one for each RAC node, if you want to get close to linear scalability as you add RAC nodes and test client load generators. Dominic has ClusterOverview Walkthrough on his site that you should check out if you plan to test Oracle RAC. If implemented correctly you should see almost linear scalability with Oracle RAC when scaling out the number of SOE schemas, one per RAC node, and scaling out the number of Swingbench load generator clients with ClusterOverview/Swingbench Coordinator.
When I tested using the SCAN service to connect and distribute client connections I was able to achieve the same transaction throughput and latency figures as with a single Oracle DB, but the interconnect was the bottleneck and the CPU utilization was proportional to the number or RAC nodes. That is, if I got 100% CPU Utilization to produce 20K TPS, when I added another RAC node I got the same 20K TPS, same transaction latency, just 50% CPU utilization on each RAC node, due to the same hot blocks being constantly worked over and the traffic across the interconnect. As per Oracle best practices I recommend a low latency interconnect, preferably 10GbE.
Scaling Up vs Scaling Out
As we can see from the scale up tests of a single database you can achieve almost linearly scalability with the Swingbench test. However should you scale up the database, especially when you want to support many databases, schemas and users, or should you scale out? If you scale out, should you use Oracle RAC or should you just use multiple standalone databases? The answer depends on your requirements and no two database environments will be the same. But as a general rule when virtualizing many databases it is better to have more VM’s and less schemas and users per VM. What the exact numbers are depends on your environment, but this general rule is based on the experience that hypervisors are very good at scheduling multiple competing workloads for maximum performance and the DB and the OS themselves are not as good at this task. But you do not want to have so many VM’s as to make management a very difficult task, especially when you don’t have a lot of automation (TIP: Invest in management automation!). You should survey your database landscape and come up with a situation that meets all of your requirements and has an acceptable trade off between manageability and performance with regard to the number of VMs. What is certain is that virtualized databases have very little overhead, even when running at full utilization, compared to their native counterparts. So there should be no performance concerns with virtualizing your largest database systems, provided the underling hardware is suitable to meet the requirements of these databases. Virtualizing business critical apps is all about reducing risk, improving availability, improving performance, and meeting the business requirements without compromising them. Don’t try and virtualize your most important databases on a physical infrastructure platform that isn’t up to the task.
Providing a Robust Scalable Network for your Virtualized Environment
Especially when deploying Oracle RAC a low latency high throughput network environment is preferred due to the cluster interconnect that coordinates between the database cluster nodes. Overall a scalable network that can scale out with your applications and servers is preferred and something that offers predictable consistent latency and throughput between endpoints. A popular network topology that provides these characteristics is a leaf spine network architecture. All of the testing in this article was based on systems deployed and connected to a leaf spine network with 40GbE Spine switches connected to 10GbE leaf switches, which are connected to the hosts. I gave an overview of my test lab network topology in Configuring Scalable Low Latency L2 Leaf-Spine Network Fabrics with Dell Networking Switches.
There are no performance barriers to virtualizing the largest and most critical databases with modern hypervisors provided the underlying physical infrastructure is up to the task. There is very little performance overhead for virtualized databases, even when running at 100% utilization. Regardless of what your source and destination systems are, when considering a migration you should baseline the source and destination to measure the relative differences in performance with similar database configurations. Swingbench offers a simple and easy way to measure the performance using a standardized setup between dissimilar systems. Although you need to be aware of how it works and it’s limitations. I have successfully used Swingbench to baseline performance between many different types of Unix systems running Oracle Databases compared to x86 when looking at Unix to virtualized x86 migration. Many thanks to Dominic Giles for creating a great tool. For those of you who are interested I used the same hardware / configuration for this article as I did for All Flash Performance on Web Scale Infrastructure. Even under a high load when stressing all system components the Nutanix platform was rock solid and provided consistent performance.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com. By Michael Webster +. Copyright © 2012 – 2015 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.