Disaster Recovery and High Availability
There is always something to learn when nature creates ruckus such as the recent Japan earthquake plus tsunami cascading its effect on radiation leak but through man-made nuclear reactors. Most of the organizations create Disaster recovery (DR) processes and DR data centers to ensure that the businesses work as usual or with minimum downtime. Most of the IT platform kicks up activity in DR center if the main Infrastructure data center gets affected by similar disasters though even on very lesser scale such as a simple case of power failure.
A simple question is what is the effective way to plan and create strategy for DRs and high availability? There are some traditional but effective thoughts to it.
1) DR centers need to be geographically separated in risk assessment of disaster that a center could be prone to. For e.g., having one Data center in Houston, Texas and other in Morgan City, Louisiana is not of much help as both are similarly prone to Hurricanes. Similarly even having one in Indonesia and other in Hawaii though geographically separated have similar risk profiles. The risk profiles of DR centers should be different and complement each other and not be similar.
2) DR centers are not a Fault Tolerant replacement. This is one mistake which most of the organizations do by using the DR data center as a fault tolerant pair for their applications. For e.g. for any failure of application in the main data center, the corresponding application is failed over or started in the other data center (DR) to maintain high availability. Though this is a easy choice but it is not the objective of DR center. DR center ideally should have a carbon copy of high critical applications (one doesn’t need all applications in case of a disaster) with same configuration up to the hostname and network id such that when it starts, it actually acts as the main infrastructure provider for the duration that is required to bring the main center up without affecting any changes to any applications. The principle is to keep the logical layer IT layer same for the business process whether the physical machines are run in main data center or DR doesn’t matter. The tendency is to make some use of DR since there is a cost to set it up. DR is an insurance against disaster and thus the investment should not be looked on for generating returns.
3) High Availability on the other hand does not directly or indirectly translate to having a back up in a remote center such as DR. HA is to provide mechanism to zero downtime to business during normal business operations. For e.g. an instance of an app A goes down due to any glitches triggered by software or hardware problems, another instance of App A is configured in a different machine in the same DR probably in the same network domain that will pick up the pending work and service the business operations.
4) Having a load balanced configuration (also called active-active) is to serve a different need than fault tolerant configuration (also called active-passive or swappable). Most of the time both are used in hybrid. This only increases the infrastructure footprint unnecessarily adding to the cost.
5) Last but not the least the statement “Cloud computing platforms will take most of the burdens from the organization on this front” is a viable alternative.


