Fast, Complete, Automated Restores Minimize the Effects of
Any Unplanned Downtime Event
In our ongoing contacts with customers, we have
found that companies tend to pay more attention to backup than to recovery.
Having reliable backup mechanisms in place is vital, but it can create an
illusory comfort zone: "I'm covered. My data is protected. It cannot be lost
irretrievably." This is much like knowing that you're insured by a good
company, and the premiums have been paid; you're probably not thinking about
what happens after the fire department leaves.
There may also be a subconscious belief that It
Won't Happen Here - a form of denial that stands in opposition to Murphy's Law.
We can be realistic about this without being negative. We don't have to list
the hurricanes, tsunamis, flash floods, and terrorist attacks of recent memory.
It's enough to recognize that disasters - local, national, or international -
absolutely do happen. They shut down entire data centers. But it's still a
disaster for your company if some event knocks out a single system and shuts
down the corporate web site. It doesn't need to be a missile from a foreign power.
It could simply be the collapse of an aging disk array.
The point is, when IT systems go down, and the
emphasis shifts instantly from backup to recovery, there is great pressure on
the IT department to re-establish operations swiftly and effectively. The
revenue pipeline dries up. Customers turn impatiently to other sites. The
company's image begins to suffer. In a worst-case scenario, a few hours of
downtime can ruin a fiscal quarter or even a company.
We have found that many companies are at a disadvantage
in this situation because they have given less thought to recovery than to
backup. The critical question here is, how quickly can your company recover
services?
The challenges of doing a fast recovery
Let's look at what happens in a typical data
center when a system goes down without warning and has to be restored. The
immediate challenge is to find and inform a person who has the capability to do
a recovery. Hopefully, the designated person is not off sick, on vacation, in
training, or applying for a job elsewhere, because the process is complicated
and requires familiarity with the environment.
Traditional recoveries require the use of a
variety of complex tools from several vendors. Many require scripting.
Configurations and patches have usually not been tracked. Recovery of Windows
systems to dissimilar hardware, in particular, has been very challenging. Even
with procedures in place, the recovery takes too long. Business suffers. And
because recoveries are so complex, and the pressure to perform is extreme, the
possibility of error is proportionately greater. The administrator may have to
go back and repeat steps to get it right.
Some of this complexity could be avoided by
standardizing hardware purchases. But how many IT managers have the budget to
buy precisely the same model EMC Clarion to simplify a potential future
recovery procedure? In any case, there's no guarantee that any two systems with
same model numbers and specifications won't have different NIC cards or mother
boards or disk drives. Complexity rules. And the downtime clock keeps ticking.
Best practices: Automated total recovery
On the other hand, software technology now
exists that shrinks the entire recovery process to a few clicks on a GUI. Its
initial advantage is that it's integrated with the backup software. It records
updated images of the environment, including machine configuration, disk
layouts, and TCP/IP configuration, during regularly scheduled backups.
Secondly, it bypasses the many and varied tools
that software and hardware vendors supply with their systems, because these
tools tend to focus on their products alone. They can't recover the environment
as a whole, so they simply add complexity to the recovery process. But the
best-practices technology is independent of these systems and provides an
overview of the complete environment from a centralized console. It has the
power to recover all failed systems completely, including configurations and
patches.
Now let's do a comparison between traditional
and best-practices recovery in a typical scenario. The administrator who is
charged with recovery responsibility is enjoying his break when he receives an
alert on his pager that a system is down. He hustles down the hall to start the
restore process. He and a team of administrators with a variety of skills (they
all happen to be available, in this hypothetical case) perform a series of
time-consuming operations:
- Collect
media
- Repair
hardware
- Reload
OS
- Reload
backup software
- Load
tapes and restore data
In the process, they make four agonizingly long
reboots. Elapsed time, if the team is knowledgeable and efficient: at least an
hour. There is no guarantee that the process will be error-free or that
configurations and patches will be recovered. If multiple systems are down,
restore time will probably stretch out to several hours.
Now here's how the administrator's procedure
would look with the new recovery technology:
- Repair
hardware.
- Select
the failed system on the console and click Prepare to Restore.
- Reboot.
He's done before his coffee gets cold. The
system is up and in production again. Hour long restores are completed in just
minutes. Errors: zero, because the
process is totally automated. Lost customer loyalty: probably none.
The recovery technology has configured disks,
logical volumes, and file systems; mounted the file systems; and restored
files, including the operating system, configuration data, applications, and
user files.
All configurations and patches are in place as
they were before the system went down, even for dissimilarly configured Windows
systems. And the recovery technology could have carried out the same procedure
in parallel for multiple restores if the entire data center was affected.
The benefits from this approach are clear and
significant. It eliminates fire drills and preserves uptime. It keeps revenue
flowing, keeps customers happy, and keeps the company productive. It is a
realistic, proactive way to maintain continuity of operations.
Eric Schou is a
Senior Product Marketing Manager with Symantec Corporation. He is currently a
part of the Veritas NetBackup Product Marketing team. Before joining Symantec,
Eric spent over ten years in the storage industry, working for both Maxtor Corp.
and Quantum Corp, in a marketing capacity. Prior to that he worked for Arrow
Electronics for five years as a Senior Sales Representative, managing tier one
distribution customers.
Trackback(0)
|