IT Lecture Notes by Mark Kelly, McKinnon Secondary

Back to the IT Lecture Notes index

Data Disaster Recovery Plan

Thanks to Jane W for suggesting this page
Read an excellent Age article explaining the need for tested backups when everything goes wrong

 

You are the IT manager at work.

Late one night, you get a phone call.

You race to work, and see this...

After the fire brigade leaves, you enter your office and see this...

Your heart sinks. What do you do now? It's at times like this, you wish you had a data disaster recovery plan (DDRP).

A DDRP identifies steps to be performed in case:

  • the company loses a key employee
  • the company is not able to access its computer
  • information on its computer or network was lost
  • the office building was destroyed
  • information has been corrupted

...to name but a few contingencies.

What sorts of disaster might strike your valuable data?

According to a White Paper from IBM, the leading causes of data loss are:
Hardware or System Malfunction 44%
Human Error 32%
Software Malfunction 14%
Viruses 7%
Natural Disasters 3%

And as time goes by, the dangers increase because:

  • businesses are becoming more and more reliant on IT to stay in business
  • paper records are often not kept - all data is stored electronically
  • businesses rely on electronic communications
  • IT systems are becoming increasingly complex and hard for the average person to maintain
  • viruses and hacking 'exploits' are becoming more common and more destructive
  • more and more employees are being given access to corporate data, increasing the chance of damage or loss
  • few corporations know the true value their data until they lose it
  • more and more corporations are linking their computer systems to communication systems, such as LANs, WANs and the Internet, thereby increasing the vulnerability of their data to external attack.
  • the more a computer is used, the more it is relied upon. At the same time, increased use increases the likelihood of system failure.

So, just how disastrous can data loss be?

IBM reported that, "Fifty percent of companies that lose critical business systems for 10 or more days never recover."

For most companies today, data is their business. If that data is lost or corrupted, or merely interrupted for a long enough period, the blow to the company can be fatal. Studies show truly disastrous results for businesses that lose access to data.

When businesses in the following fields lost access to their data for the given time periods, 25% suffered immediate bankruptcy; 40% went bankrupt within two years; and almost all were bankrupt after five years.

Type of Business Average Length of Data Loss
Banking 2 days
Commercial 3.5 days
Industrial 5 days
Insurance 5.5 days


And how much does it cost to recover data?

National Computer Security Association research results:

Data Recovery Time Cost Data Recovered
19 business days $17,000 20M of sales and marketing data
21 business days $19,000 20 M of accounting data
42 business days $98,000 20M of engineering data


Building a DDRP

  • Predefine the conditions that may cause your recovery plan to go into effect: some threats are common to any system; others may be peculiar to a single organisation or location (e.g. in Australia, a bushfire plan would be critical. In Kansas, a tornado plan is important.)
  • Identify decision makers and their roles before, during and after an outage emergency
  • Inventory the resources required to bring your IT systems back online
  • Identify assumptions on backup technique, frequency and location for data vintage and retrieval
  • Prioritize and sequence the restoration actions defined in your recovery plan into a detailed timeline and checklist
  • Predefine an operation center to coordinate status, issues and assignments
  • Develop communication strategies for keeping your employees and customers informed
  • Organise your recovery plan into a flexible, easily maintained tool
  • Validate your recovery plan by conducting simulations based on real-life outage emergency declarations

Let's return to the fire at your office...

You should ring the office manager, but you don't have her home phone number. You need to ring the insurance company immediately to get the destroyed equipment replaced, but you can't remember what company insures you or where the policy is (oh... dear. You remember: the policy was in the filing cabinet your burnt out office.) You need to rent emergency equipment to get back into business... but you can't remember the phone number of that company either. You need to get your backup tapes to restore the file server's data... oh no...the backup tapes were in the filing cabinet with the insurance policy. At least you can get a copy of your recovery plan and... oh dear. The only copy of the plan was stored on the file server.

You really are up the proverbial brown creek...

You wake up in a cold sweat. You are safe in bed at home. It was just a nightmare....

You get to work early the next morning. The only thing on your mind is preventing the consequences of your nightmare coming true. What do you do?

You print out your draft disaster recovery plan and read it. You discover it's out of date and does not cover many of the problems you faced in your nightmare. You get a team together from management, IT staff and office staff and update and complete the plan.

What should a good DDRP achieve?

  • Provide for the safety and well-being of people on the premises at the time of a disaster;
  • Continue critical business operations;
  • Minimize the duration of a serious disruption to operations and resources (both information processing and other resources);
  • Minimize immediate damage and losses;
  • Establish management succession and emergency powers;
  • Facilitate effective co-ordination of recovery tasks;
  • Reduce the complexity of the recovery effort;
  • Identify critical lines of business and supporting function

XYZ Pty. Ltd.
Risk Analysis and Data Disaster Recovery Plan

DATA SERVICES: Finance Section.
LOCATION: 45 Allen St, McKinnon. Second Floor, rooms #215-218

I. LIST OF ALL SENSITIVE INFORMATION SYSTEMS

A. Network Administrator's PC -- Critical and confidential data
B. Administrative Assistant's PC -- Critical and confidential data
C. File server #1 -- Finance records -- Confidential data
D. File server #2 -- User directories -- Confidential data
E. Web server -- corporate website -- Critical data
F. Login server -- Critical data
G. Transaction workstations (75) -- confidential data

All workstations and servers are leased from:

ABC Leasing Co.
495 Collins St, Melbourne 3000. Phone 9384 4958. Contact: Joe Salvani

Equipment Insured by:

Equity Insurance Co.
6th floor, 68 Spring St, Melbourne 3000. Phone 9438 3843.
Policy # 4956-3945. Contact: Sue McIntosh. Policy expires: 4 September 2003.

Leasing and insurance documents are stored:
1) Electronically, on server 1.
2) Printed copy in fireproof safe in office manager's office.
3) Safe Deposit box #193, Fidelity Bank, 66 Brook St, McKinnon. Phone 9727 4834. Key is with office manager.

II. RISK ANALYSIS [by machine]

A. Network Administrator's PC -- this machine includes personal emails and documents, administrative tools, employee password lists, hardware audit. This data is critical to system operation.
B. Administrative Assistant's PC -- This machine has copies of the tools stored on the Network Administrator's PC, and personal emails and documents.
C. File server #1 -- Finance records -- This machine stores a complete record of all historical and current transactions, payroll data, billing and accounts. This data is critical to the operation of the company.
D. File server #2 -- This machine stores primary copies of all employee correspondence, spreadsheets, budgets and customer records. This data is critical to the operation of the company.
E. Web server -- This machine stores the company's website. The data is important to the operation of the company.
F. Login server -- This machine stores authentication data to control access to corporate data. This data is critical to system operation.
G. Transaction workstations -- these machines store application software. They may contain secondary copies of documents on server #2. There should be no data on these machines that is not also stored on server #2.

III. BUSINESS IMPACT ANALYSIS

o Costs of Loss of Critical Information: The cost of recreating the critical information is minimized by the availability of weekly full backups and nightly incremental backups completed on the Administrative Assistant's machine. In the event of loss of any of the servers, we can reload the information from the backups. In the worst case, if the destruction occurred at the end of the day, we would have to rekey just that day's transactions. During the busiest time of the year, that would require two person-days of effort.

o Costs of Loss of Sensitive Information: If sensitive information is exposed, the exposure would be in terms of damage to the reputation of the company. In addition, there is the possibility of costs associated with legal actions.

o Risks: The risk of physical loss of information, both critical and sensitive, is associated with the reliability of the equip- ment, the power protection afforded the equipment, the security of the premises, and the age of the equipment. We have tried to minimize these risks by the following:

1. Adequate Uninterruptable Power Supplies, and associated power protection is provided for each machine;
2. The quality of the equipment, while not the very best, is reasonable, within budget constraints;
3. The machines are on a five year replacement cycle.
4. The premises are protected with high-quality locks with copy-protected keys, biometric thumbprint identification of employees, HALON fire protection and fire extinguishers, and fire detection systems linked to Acme Security Company. A concealed closed-circuit security camera has been installed in the file server room, which has a reinforced and deadlocked door and has no external windows.

IV. SECURITY SAFEGUARDS

All personnel are made familiar with the requirements for security and confidentiality through one-on-one training by current staff and their Departmental Security Contact.

A. Backups:

A grandfather-father-son backup scheme is employed. Four daily backup tapes are used during Monday-Thursday. After 3 months, these tapes are promoted to become weekly tapes. Weekly tapes are promoted to become monthly tapes after 12 months' use. After 3 years' use, monthly tapes are verified for quality and become annual backup tapes, which are archived in the fireproof safe and no longer used.

A daily backup is done by the Administrative Assistant at 4:30. The tapes are given to the Office Manager who will take them home until the same day next week.

Full weekly backups are performed using weekly backup tapes each Friday at 4:30, using the same procedure as for the daily backup. A copy of the weekly backup tape is stored in the fireproof safe, in case of disaster at the Office Manager's home.

Monthly backups are performed, using the appropriate monthly tape (January-November), on the last Friday of each calendar month, as described above.

Annual backups are performed on December 23rd at 4:30, as described above. These tapes are made permanently write-protected and are stored for archival purposes in the fireproof safe for ten years. A copy of the annual backup is burned to DVD-ROM and stored in a locked cabinet in the Box Hill branch office. After 10 years, the backup tapes are to be destroyed by the office manager, but the DVD-ROM copy should be retained.

All backups are to be automatically verified for accuracy as they are written.

B. Paper forms used for data input, and reports associated with confidential information are kept in files which are locked when we are away from our offices. Offices are kept locked after normal work hours, on weekends and holidays, and during periods when all staff are absent from the office area. All computers in the office are password-protected and have inactive-lock time-out software installed. The most sensitive files on both the Administrator's and the Assistant's machines are also password protected. Critical financial information is encoded using RSA encryption. The unlocking keys are kept in the safety deposit box at the bank described above.

C. Access to our sensitive system information is limited to the Administrator and the Assistant. Master passwords to gain unrestricted access to the file servers are kept in the safety deposit box, and should not be changed unless:

  • a breach of security is suspected, or
  • the office manger, adminstrator or assistant administrator leaves the company

in which case, master passwords should be changed immediately and the new password stored in the safety deposit box in place of the old passwords.

Reports required by government departments such as the Taxation Office should be transmitted in sealed transfer envelopes by registered mail.

D. The disaster recovery plan, security safeguards, access rights, and staff responsibilities are covered in our office staff policies and procedures training manual. This manual is reviewed yearly and updated as required. No employee should be given access to any data unless it is necessary for them to conduct their duties. A list of data access privileges for each job description is published in the staff policy manual. The network should be configured to force the expiry and changing of all (except master) passwords at least every three months.

E. Employee security:

  • No floppy disk drives are installed in employee workstations, to avoid unauthorised copying of sensitive or confidential data. No employee is to bring to work any unauthorised data storage device such as USB memory keys, external plug-in storage media such as hard disk drives, 'Zip' drives, or CD burners. Breaches of this rule will result in immediate dismissal.
  • For the same reason, all outgoing emails are to be logged and copies are to be kept.
  • All electronic communications with our branch offices that contain sensitive data must be encrypted using 128bit RSA encryption.
  • As soon as an employee is dismissed or resigns, the employee's access to data must be terminated.
  • No employee may give their passwords to any other employee (apart from the Administrator), or use any other employee's passwords to gain access to data for which they should not have access rights.

F. Equipment Auditing:

The Adminstrator will maintain and manage an active inventory of all equipment and software located in the the organisation. All incoming equipment and software will be labeled and tracked for identification purposes when it enters the company.

V. PLAN ACCURACY: This plan is tested and reviewed yearly and updated as required. All backup procedures should be tested annually. Backup equipment should be tested and serviced annually.

Contact Data of Key Personnel

The following employees' data should be kept on file by the Office Manger, and copies kept at home by each of the other key personnel: Office Manager, Administrator, Assistant Administrator.

Name: _______________________________
Phone extension: _____
Home Phone:__________________________
Home Address: ________________________
E-mail: ______________________________
Emergency contact: ____________________
Last updated on: ___/___/______
Next update due: ___/___/______

In the event of emergency to key personnel (death, disappearance, dismissal, serious injury):

Office Manager: the Administrator is to immediately assume the temporary role of Office Manager. If the safety deposit box key cannot be located, a copy of the key is in the safe-keeping of the General Manager of XYZ Pty Ltd (phone 8348 7022, or mobile 0041 304 495). The combination of the fireproof safe is also held by the General Manager.

Administrator: the assistant administrator is to immediately assume the temporary role of Administrator. System passwords may be obtained from the office manager.

ESSENTIAL SYSTEM INFORMATION

Backup drive type:
Exxon Model VA394. Contact ViaTech, 45 Paragon St, Cheltenham. Phone 9583 2938

Backup software needed for data recovery:
TruData, version 6.03 (Enterprise Edition). Backup copies of this software are in the fireproof safe.
In an emergency, contact ProDat Data Recovery, 7/394 Centre Rd, Chelsea. Phone 9773 3949.

Server configuration: all servers are the same type, HP Server ND4056. Refer to the leasing company named above for emergency replacement machines. Operating system is Novell Netware 5. Web server OS is Linux 7.03.3045 running Apache version 4.685.345 (current at 23 August 2002).

Workstation software: Master copies of workstation software are stored on CD in the fireproof safe. Basic configuration is: Windows 2000, StarOffice version 6, Netscape Communicator 7, QuickBusiness 3.04. Software licences are stored in the safety deposit box.

A copy of this DDRP is stored on server 1 in SYS:\\DOX\DDRP.DOC. Printed copies are stored in the fireproof safe, the safety deposit box and with each of the key personnel described above.

EMERGENCY PROCEDURES:

A copy of these procedures are to be included in the employee manual, and prominently posted in all offices. These procedures must be described in the training of all new staff, and reinforced annually to existing staff.

In the case of fire:

- The office manager should, as far as conditions allow:

1) Activate fire alarms manually, if they have not already been activated.
2) Notify the fire brigade (Phone 9384 2345 or 000). In case the telephone system has been disrupted by the fire...
etc.

- The Administrator should, as far as conditions allow:

1) Shut down the file servers and eject the removable hard disk drives. These should be packed in the provided case and taken from the building.
etc.

- Department Managers should, as far as conditions allow:

1) Check all work areas and evacuate all staff.
etc.

- Other Employees should, as far as conditions allow:

1) If there is no fire and a sprinkler is activated unnecessarily, the supplied plastic sheets should be used immediately to cover and protect computer equipment. Turn off power immediately; then use plastic sheets.
2) etc etc

In the case of server failure:

The system administrator, or the assistant administrator in the absence of the administrator, should:

1) Attempt all appropriate quick measures to bring the server back online.
2) Contact the supplier of the server to arrange an emergency replacement machine.
3) Acquire the most recent backup tapes from the Office Manager.
4) Restore backed-up data, as far as possible, to the server.
5) Organise the re-entry of data entered between the last backup and the installation of the new server.
6) Bring the new server online.
7) Have the failed server repaired or replaced.

In the event of a hostage crisis:

etc

etc

Document last updated: 7 June 2002

Of course, such a plan is quite complex and takes quite a bit of work, involving many people in the organisation. But would you rather put in the effort now, and restore your organisation's effectiveness quickly, or skip it and wake up bathed in sweat when the inevitable disaster strikes?

Back to the IT Lecture Notes index

Last changed: March 23, 2007 1:42 PM

IT Lecture notes copyright © Mark Kelly 2001-