High Availability vs Disaster Recovery: What's the difference?

As we increasingly rely on our IT infrastructure for the everyday running of a business, the thought of critical failure, a corrupt database or a power outage is not only an operational nightmare but also an issue for information security.

Due to such worries being so prevalent, organisations are now investing a large proportion of their IT budget in solutions that can ensure they remain up and running if the worst happens. Whilst it is widely recognised that such measures are needed, there is naturally some debate and confusion around what solution provides the greatest level of protection.

What is High Availability and Disaster Recovery?

In the context of the modern and digital way of working, high availability (HA) and disaster recovery (DR), both reduce downtime and maintain business continuity in times of trouble. But what do they mean?

High Availability (HA) – This refers to a system, network or aspect of an infrastructure that is continuously operational for as long as possible.
Disaster Recovery (DR) – This refers to a set of policies and procedures that enable the recovery or continuation of vital infrastructure and systems following a natural or human disaster.

High Availability

While we may think high availability is all about a system or network that is continuously operational, it is far more complex than it originally sounds.

High availability is all about eliminating single points of failure to ensure the continuous running of a system or application. As the core concept of HA is about reducing points of failure, the notion of redundancy is naturally built-in and split into three key areas that are applied to most systems: hardware, software and environmental.

1. Hardware redundancy

This was one of the first ways HA was introduced into the world of computing. Before applications had a continuous internet connection and could be backed up anywhere and at anytime, hardware redundancy was vital. Today manufacturers continue to look to solve points of failure by incorporating redundant storage elements, power supplies and networking solutions.

Redundant storage ensures that data is written to read from multiple physical disks. This prevents data loss and downtime in the instance of a server failing
Redundant power typically occurs in the form of multiple power sources, enabling admins to failover to a backup power supply in the instance of failure from a single source
Redundant networking allows connection to multiple independent networks to ensure that a server remains online in the event of a network failure on the main network connection

2. Software redundancy

As technology and demands developed, developers ensured that applications themselves could tolerate failures in a system, be it for reasons including hardware or configuration errors. Today this is often accomplished by:

Clustering technologies, allowing workloads to be spread across several different servers
Load balancing, allowing incoming requests to be routed to healthy application nodes as well as raise issues to proactively mitigate against failure
Self-healing systems, that allow workloads to move around or allocate additional capacity when failures occur

3. Environmental

As cloud computing continues to rise, providers are now taking HA to another level through two key areas:

Hardware redundancy on a server rack level, allowing users to spread workloads to mitigate single points of failures without having to transition to another data centre
Data centre redundancy, allows users to run applications in separate data centres that are located geographically close to each other, specifically for instances that are out of the user’s or data centre operators’ hands

In instances where all of these factors fail and a system or application goes down, this is where disaster recovery comes into play.

Disaster Recovery

Disaster recovery can take shape in a number of different forms, from simply restoring a backup to significantly more complex actions.

In a similar multi-faceted nature to high availability, disaster recovery incorporates two core concepts:

Recovery time objective

This is the maximum amount of time that a system can be down before it is recovered to its operational and original state. Naturally, this period varies between the system or application and its importance. For the low-level systems, this recovery time can be measured in a matter of hours or even days, but for business-critical systems, it will usually be measured in seconds or minutes.

Recovery point objective

This is the amount of data loss measured in time that can be tolerated in a disaster. Using the above analogy of low-level systems, losing a day or two worth of data may be acceptable, while for business-critical systems such as transactional websites, that may be as short as minutes or even seconds.

The nature of the business and the importance of the data or information stored will naturally determine your threshold for your recovery time and recovery point objective. In instances where the threshold is low (quick), data replication between primary and secondary systems is advised to be constantly active with a backup system ready to immediately take over in the event of a disaster. Where the threshold is longer or higher, restoring systems from daily backups may often be enough to return operations to normal with no secondary live site needed.

So what is the difference between High Availability and Disaster Recovery?

While the aim of both HA and DR is to keep systems online and functional at all times, they do differ in the roles they provide.

Due to the nature of single point of failure and redundancy measures in place, high availability systems should continue to operate in instances of power, network or hardware failures. The element of redundancies should provide the user enough time to fix the issue or wait for the issue to be fixed by itself, whilst continuing to operate as normal.

Disaster recovery is almost the final point of call if there is a complete failure and ensure that mission-critical data is not lost and downtime is kept to a minimum. In short, no matter how available your system is, it is always advised to have some form of DR plan in place.

High Availability (HA): Addresses power, network, or hardware failures, leveraging redundancies to allow time for issue resolution.
Disaster Recovery (DR): Acts as the final resort in complete failure scenarios, ensuring minimal downtime and safeguarding mission-critical data.