No one likes the thought of a disaster. But looking the other way and hoping it won’t happen to you is no way to run a business. If you want your organisation to be resilient, you need a detailed disaster recovery plan—one that sets out the steps needed to restore your ICT systems to a state in which they can support the business after a data breach, equipment failure or natural disaster. Creating such a plan is the goal of a disaster recovery planning process.
How do you generate a plan? First you need to perform a risk analysis (RA) and a business impact analysis (BIA). These two processes identify the ICT services that support your business-critical activities. Once the analyses are complete, you will also need to define a timeline for recovery, and points along that timeline that have to be recovered in order. (For instance, you need a working server or some way to connect to the internet before you can access your cloud storage.)
Once all of this is complete, you can now move forward to the development of disaster recovery strategies and a comprehensive recovery plan.
So how do you write that plan and develop those strategies?
Developing Disaster Recovery Strategies
The strategies you put in place define what YOU plan to do when responding to an incident. They should define the approaches the business will take to protect itself and create the required resilience. The principles include prevention, detection, response, recovery and restoration of business services after an incident.
You should begin with a list of your critical business requirements. Next, you will create a table. For each Critical System, the table should include headings to identify (at minimum) Recovery Time Objective, Recovery Point Objective, Threat, Prevention Strategy, Response Strategy and Recovery Strategy.
Filling out this table is a time to consider everything that could possibly go wrong. But how do you decide what your Prevention Strategy, Response Strategy and Recovery Strategy should be? Issues to consider include budgets, risk management attitudes, resource availability, cost/benefit analysis, staff and technology constraints, and compliance requirements.
Additional factors include:
- People: The availability of staff and contractors with critical skills.
- Physical Facilities: Security and staff access to alternative work areas.
- Technology: Power, data and communication infrastructure; fallover and replication of data.
- Data: Backup of critical information; security of the backup location.
- Suppliers: Availability of equipment, materials and contractors.
- Policies and Procedures: Senior management approval for the recovery process; step-by-step procedures for recovery.
Translating the Disaster Recovery Strategies into a DR Plan
Once your disaster recovery strategies have been developed, you will need to drill down and define the high-level action steps that will ultimately become your disaster recovery plan. These action steps can be broken down into detailed step-by-step procedures that follow a logical sequence.
Your written procedures should be easy to use and easy to follow, and spell out how to go from an event to recovery in minimal amount of time.
There are global standards that should be reviewed for incident response activities. (ISO/IEC 24762, ISO/IEC 27035 (formerly ISO 18044). These are a system of policies and procedures that are adhered to in the event of a disaster.
ICT disaster recovery plans should form part of an incident response process, and include a definition of the steps to be taken to recover. This process can be seen as a timeline, in which response actions precede disaster recovery actions.
The Plan Structure
The best plans have an initial section that summarises key steps and lists key contacts. The plan should then go on to include the following sections:
- Introduction: Scope and purpose of the plan, who has signed off and who is authorised to activate the plan.
- Roles and Responsibilities: Roles and responsibilities of the DR team; their contact details and limit of authorisation.
- Incident Response: The definition of an event, how to determine of the severity of the event, how to contain the event and minimise the damage, whom to contact and why they are on the list to be contacted.
- Plan Activation: What constitutes a reason to launch the plan, and the criteria for the plan launch. This should include assembly areas, management decision requirements, and who has the authority to activate the plan.
- Document History: Dates and revisions.
- Procedures: The more detailed the plan and procedures, the more likely that ICT systems will be recovered and returned to normal operations.
- Other Relevant Information: System inventories, network and system assets, contracts and service level agreements, and any other documentation that will be needed to recover the system.
Once your DR Plans have been completed, they are ready to be exercised. Testing is a critical component of any DR Plan. One of the best ways to test your disaster recovery plan is to turn it into a table top game. Like the old role playing game you have someone lay a scenario on the management and DR team and make sure that everyone has the plan in their heads.
Lastly, you must create staff awareness. You will need to train and educate your staff so that every employee knows what to expect and what their role should be if disaster strikes. Since the DR play may change, you also need to implement a level of record management to ensure that updates to the plan are distributed in a timely manner. All this may sound tedious, but it’s essential—having a plan in place is no use if no one knows where to find it. Employee awareness will ensure that your strategy is actually carried out promptly enough to prevent catastrophe.