DRP stands for Disaster Recovery Plan. It is basically a plan of technical and organizational processes. It ensures the recovery of the information technology services after a catastrophe such as fire, hardware failure, etc. It is a specific scenario of a Business Continuity Plan, which deals with business operations running in general.
Before defining the plan in detail, there are some considerations that can be useful to determine the scope and needs of a DRP.
To help define the scope and priorities of a DRP, the following questions can be answered:
Discussing these considerations will be a great help to define a DRP, as they always have to be kept in mind when deciding between different solutions, with different implementation and usage costs.
There are different types of infrastructure configurations that can be identified for DRP. Here are the top three to keep in mind:
The infrastructure is rebuilt from scratch in case of disaster, using infrastructure as code, pipelines, scripts, or manual configuration. This is the less expensive solution as you do not have to run more instances. However, you have to consider the cost of the time necessary for rebuilding the infra.
Sometimes, depending on the components rebuilt, the delay can also be quite long (for example, a database instance might take several hours to be provisioned).
Some of the infrastructure is replicated but does not get traffic or is shut down if possible. In case of disaster, the passive environment is turned active, with leftover configuration (manual or scripted). This reduces the deploying time but can increase the cost as there are more machines deployed.
The infrastructure is replicated in real-time and can serve traffic in a different region. This ensures failover but also reduces latency between different regions. Manual/Automatic operations for failover are significantly reduced. However, the cost is multiplied as there are several machines running at the same time.
Depending on the availability needs of the components (that should be listed during the RTO/RPO definition), you can use one of the different solutions for your DRP. Sometimes, one might not be accessible depending on the component you target, or there might be too much rework necessary on your infrastructure to consider it a viable solution.
Considering this can also help during the decision process.
Now, let’s focus on an example of azure infrastructure to draw a sketch of a DRP. We will suppose that the scope has already been defined for the most critical application for business.
Here are the considerations taken into account for this DRP, and the decisions made according to them:
With the given consideration, and with azure solutions available, here is what could be done:
This is an example that illustrates the different possibilities for a DRP, according to the business needs. Of course, depending on the constraints, it should be adjusted.
We have seen an overview of what a DRP is and how it can be implemented on an azure simple infrastructure. I would like to highlight the following point to keep in mind when thinking of a DRP:
Building a Disaster Recovery Plan can be long and tedious, so it is important to always stay focused on the objectives and scopes defined beforehand 😉