| 
Why Your Recovery Should Be At Least As Good As Your Backup Print

Fast, Complete, Automated Restores Minimize the Effects of Any Unplanned Downtime Event

In our ongoing contacts with customers, we have found that companies tend to pay more attention to backup than to recovery. Having reliable backup mechanisms in place is vital, but it can create an illusory comfort zone: "I'm covered. My data is protected. It cannot be lost irretrievably." This is much like knowing that you're insured by a good company, and the premiums have been paid; you're probably not thinking about what happens after the fire department leaves.

There may also be a subconscious belief that It Won't Happen Here - a form of denial that stands in opposition to Murphy's Law. We can be realistic about this without being negative. We don't have to list the hurricanes, tsunamis, flash floods, and terrorist attacks of recent memory. It's enough to recognize that disasters - local, national, or international - absolutely do happen. They shut down entire data centers. But it's still a disaster for your company if some event knocks out a single system and shuts down the corporate web site. It doesn't need to be a missile from a foreign power. It could simply be the collapse of an aging disk array.

The point is, when IT systems go down, and the emphasis shifts instantly from backup to recovery, there is great pressure on the IT department to re-establish operations swiftly and effectively. The revenue pipeline dries up. Customers turn impatiently to other sites. The company's image begins to suffer. In a worst-case scenario, a few hours of downtime can ruin a fiscal quarter or even a company.

We have found that many companies are at a disadvantage in this situation because they have given less thought to recovery than to backup. The critical question here is, how quickly can your company recover services?

The challenges of doing a fast recovery
Let's look at what happens in a typical data center when a system goes down without warning and has to be restored. The immediate challenge is to find and inform a person who has the capability to do a recovery. Hopefully, the designated person is not off sick, on vacation, in training, or applying for a job elsewhere, because the process is complicated and requires familiarity with the environment.

Traditional recoveries require the use of a variety of complex tools from several vendors. Many require scripting. Configurations and patches have usually not been tracked. Recovery of Windows systems to dissimilar hardware, in particular, has been very challenging. Even with procedures in place, the recovery takes too long. Business suffers. And because recoveries are so complex, and the pressure to perform is extreme, the possibility of error is proportionately greater. The administrator may have to go back and repeat steps to get it right.
 
Some of this complexity could be avoided by standardizing hardware purchases. But how many IT managers have the budget to buy precisely the same model EMC Clarion to simplify a potential future recovery procedure? In any case, there's no guarantee that any two systems with same model numbers and specifications won't have different NIC cards or mother boards or disk drives. Complexity rules. And the downtime clock keeps ticking.

Best practices: Automated total recovery
On the other hand, software technology now exists that shrinks the entire recovery process to a few clicks on a GUI. Its initial advantage is that it's integrated with the backup software. It records updated images of the environment, including machine configuration, disk layouts, and TCP/IP configuration, during regularly scheduled backups.

Secondly, it bypasses the many and varied tools that software and hardware vendors supply with their systems, because these tools tend to focus on their products alone. They can't recover the environment as a whole, so they simply add complexity to the recovery process. But the best-practices technology is independent of these systems and provides an overview of the complete environment from a centralized console. It has the power to recover all failed systems completely, including configurations and patches.

Now let's do a comparison between traditional and best-practices recovery in a typical scenario. The administrator who is charged with recovery responsibility is enjoying his break when he receives an alert on his pager that a system is down. He hustles down the hall to start the restore process. He and a team of administrators with a variety of skills (they all happen to be available, in this hypothetical case) perform a series of time-consuming operations:
  1. Collect media
  2. Repair hardware
  3. Reload OS
  4. Reload backup software
  5. Load tapes and restore data
In the process, they make four agonizingly long reboots. Elapsed time, if the team is knowledgeable and efficient: at least an hour. There is no guarantee that the process will be error-free or that configurations and patches will be recovered. If multiple systems are down, restore time will probably stretch out to several hours.

Now here's how the administrator's procedure would look with the new recovery technology:
  1. Repair hardware.
  2. Select the failed system on the console and click Prepare to Restore.
  3. Reboot. 
He's done before his coffee gets cold. The system is up and in production again. Hour long restores are completed in just minutes.  Errors: zero, because the process is totally automated. Lost customer loyalty: probably none.

The recovery technology has configured disks, logical volumes, and file systems; mounted the file systems; and restored files, including the operating system, configuration data, applications, and user files.

All configurations and patches are in place as they were before the system went down, even for dissimilarly configured Windows systems. And the recovery technology could have carried out the same procedure in parallel for multiple restores if the entire data center was affected.

The benefits from this approach are clear and significant. It eliminates fire drills and preserves uptime. It keeps revenue flowing, keeps customers happy, and keeps the company productive. It is a realistic, proactive way to maintain continuity of operations.


Eric Schou is a Senior Product Marketing Manager with Symantec Corporation.  He is currently a part of the Veritas NetBackup Product Marketing team.  Before joining Symantec, Eric spent over ten years in the storage industry, working for both Maxtor Corp. and Quantum Corp, in a marketing capacity.  Prior to that he worked for Arrow Electronics for five years as a Senior Sales Representative, managing tier one distribution customers. 
Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy
 

Video News