High Availability and Disaster Recovery Support for MySQL in AWS
One of the common questions I hear often is whether High Availability and Disaster Recovery is the same thing or not.
A highly available database or an application is described as “fault tolerant” or having the ability to “fail over”. Although this implies the database is resilient enough to survive a disaster, implementing high availability alone does not achieve disaster recovery.
So how does Disaster Recovery differs from High Availability?
Disaster recovery includes the use of a geographically distant alternate site not just redundancy at the database, application or datacenter level. DR also focuses on re-establishing services after an incident not just failover.
Using MySQL Databases for mission critical applications requires implementing High Availability and Disaster Recovery. Setting up MySQL for High Availability and Disaster Recovery requires in depth expertise to properly setup the environments. However most of these steps are simplified or automated in AWS, when provisioning a MySQL database using AWS Relational Database Service (RDS).
MySQL High Availability
Provisioning MySQL database using AWS RDS, provides options to enable Multi-AZ deployments, which manages synchronous data replication across Availability Zones with automatic failover. AWS manages and updates the DNS record sets to point to the running database instance when a failure happens where all of your database updates are intact. This allows your applications to function normally even when Master or Slave database instance fails due to an unlikely event such as database component failure or loss of availability in one AWS Availability Zone. Although this has been the most common approach in implementing HA for MySQL in AWS, it also doubles the cost for the database since there are two instance running in AWS.
Read Availability with Read Replicas
Although AWS RDS read replicas are designed for scalability of reads, it also provides better availability for read queries with asynchronous replication. However replication lag can vary significantly which can cause recent database updates made to a source DB instance, not to be synchronized on the associated Read Replicas in the event of an unplanned outage on the source DB instance. Therefore Read Replicas do not offer the same data durability and availability benefits as Multi-AZ deployments.
While Read Replicas can provide some read availability benefits, they and are not designed to improve write availability.
Backup & Restore
You can also consider using MySQL Backup and Restore approach. You can either create AWS RDS Snapshots or run MySQL backup command from an EC2 instance and issue the restore command when needed. AWS RDS Snapshots for MySQL are stored internally in AWS S3 allowing high durability and availability replicating the snapshot across multiple availability zones.
Using Multi-AZ deployment option also provides consistent database performance within the maintenance window allowing to take backups from your standby database instance. Taking backups from standby instance avoids I/O suspension on the DB instance primary.
MySQL Disaster Recovery
Having Multi-AZ Deployments of MySQL database in AWS RDS also provides disaster recovery upto certain level since the database is replicated across multiple availability zones (data centers) which is geographically distance from each others. However based on the business requirements, it might require to use a different AWS region or an onsite data center setup to store the replicated data of MySQL instances across different AWS regions.
Disaster Recovery with Read Replicas
You can also use Read Replicas for disaster recovery of MySQL databases. AWS recently announced Cross-Region Read Replicas for Amazon RDS for MySQL which allows to have a read replica in a different region than the database master. When a disaster happens, this makes it possible to promote an replica from a different region as the master or restore data back from the replica node.
Extract Transform and Load (ETL)
This is one of the most common approaches used for database disaster recovery. You can setup an ETL pipeline, either using an AWS EC2 instances, AWS Lambda or AWS Glue to move data from AWS RDS to other data sources in different regions.