Have Your BCP and DR Plans Tested Before the Earth Moves
The earthquakes yesterday in Wellington, the capital city of New Zealand, brings back vivid memories of the devastating earthquakes of Christchurch and Japan a couple of years ago, which both countries are still recovering from. The earthquakes in Wellington were quite severe, with the biggest quake recorded at 6.5 with others of 5.7, 5.8 and over a hundred aftershocks. Luckily there was no loss of life in the Wellington quakes, but there was a bit of damage and the CBD was shut down for a day while assessments were completed on the buildings and infrastructure. I spoke to a few friends and customers after the quakes to see if they were ok. Some of the customers I know where already up and running in their DR sites, or had access to their infrastructures remotely from wherever they were. What yesterday’s events highlight is the need to have your BCP and DR plans in place and tested before disaster strikes. This article will cover briefly some things you might want to consider when designing your BCP and DR Plans, that can make them easier to test, simpler to execute, more reliable and easy to audit, based on the events in Wellington.
So that we’re all on the same page I’d like to define BCP and DR. Business Continuity Planning (BCP) is the process of defining how your business will operate after a disaster or significant business impacting event, potentially in the absence of IT. A business continuity plan provides a roadmap for continuing operations under adverse conditions. Disaster Recovery is the process of preparing for the recovery or continuing operations of vital IT systems during a natural or human induced disaster. Your DR plans should be based on and support your BCP Plans. The most common disasters that will impact IT and business operations are human induced, but the quakes yesterday highlight the need to have these plans in place before the earth moves. You should plan for the worst.
Once you have your BCP and DR plans in place you need to know they will work and you need to regularly test them. In terms of making DR simple and testable one of the most important tools you can use is VMware Site Recovery Manager and also VMware’s Horizon View. The first thing to consider is that if you can’t test or haven’t tested your DR then you can’t trust it. If you can’t test it you can’t trust it. So your DR and BCP plans must be testable.
VMware Site Recovery Manager provides DR run book automation and allows for completely auditable testing of your DR plans in a way that is non-disruptive to production. This allows you to test your DR much more often and provide confidence that it will work. When disaster strikes you can’t guarantee that your IT operations team will be available, Site Recovery Manager provides a simple way for even non-technical users to recover critical business systems when the worst happens. VMware’s Horizon View allows remove access to virtual desktops from anywhere and supports soft phone integration or integration into call centre systems. So essentially you could have your employees running remotely just as if they were local to the office. Provided the systems are available in your datacenters.
Learning from Disaster
So what can we learn from the experience in Wellington yesterday? Here are some things worth considering.
Even a minor natural disaster could result in large parts of a city or geographic area being cordoned off and unaccessible for a number of days. In Wellington CBD many streets are cordoned off and unaccessible due to building damage. It’s likely that the cordons will be removed quickly, but that might not always be the case. If you don’t have a strategy to provide remote access to your systems then you may suffer a more severe business impact. This is where VMware’s Horizon View can come in. Some of the customers I spoke to in Wellington were remotely accessing their systems from virtual desktops. They had a DR office as well as access from home for staff. Many human-induced events can cause cordons to be put in place such as bomb scares, virus outbreaks, construction activities etc. Even thought the damage in Wellington was not that severe the cleanup in some of the buildings will still take days.
Although the quake itself didn’t cause widespread severe building damage broken sprinkler systems and broken water mains has caused flooding and water damage, including destroying computers in some office buildings. Broken sprinkler systems and water mains in your datacenter and in your office could destroy your systems even if the other disaster events do not. If your business is located close to the coast in an area prone to Tsunami’s then you probably don’t want your power generators or datacenter to be located below ground.
It’s what you don’t know or don’t plan for that will hurt you. If you’ve assumed physical access to a building as part of your DR process, then that won’t work if the building is cordoned off for a number of days. If you plan for the worst and use technology to simplify your recovery process then you will be able to better adapt to situations you haven’t anticipated.
You need to take a risk based approach to your BCP and DR planning. By this I mean the strategies you employ to provide for business operations need to be cost effective and justifiable and weighted to the likely business impact and probability that a particular event will occur. Plan for the likely events that might impact your particular locations. Not everyone is in an earthquake prone location, but you may be prone to other events. Make sure your organisation has sufficient business interruption insurance.
Even minor disasters can cause a big impact and mean days of cleanup and days before you can get access back to your office. Even if there isn’t major building damage resulting from an earthquake there can be water damage and cordons that can still severely disrupt operations. If you can’t test it you can’t trust it. Have your DR plan in place and tested, you never know when you’re going to need it. VMware technology can help make the recovery process much quicker, more reliable, simple, auditable, testable.
This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2013 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.