PostgreSQL Backups 101

Published Jul 29, 2019

Image by Steve Buissinne from Pixabay

Backing up your database is essential to ensure business continuity in the event of a failure or disaster. Having your data safe ensures you can bounce back and restore operations quickly. Read on to learn more about various backups methods for PostgreSQL databases.

PostgreSQL Backup Types

Postgres backups can be either physical or logical. Physical backups are known as filesystem-level snapshots. Logical backups are known as SQL dumps. Both backup types are useful, and each has its advantages or disadvantages.

On one hand, a database is a group of files on a drive in a data center. On the other hand, a database is perceived as a sequence of SQL statements used to create and manipulate the data.

SQL Dump

This method works by generating a text file with SQL commands to recreate the Postgres cluster, a database or a given table. It uses commands pg_dump and pg_dumpall, although only for logical exports. This makes it a poor choice for large databases since it takes too long to import and export.

Continuous Archiving(CA) and Point-In-Time-Recovery (PITR)

It is a combination of a file-system-level backup with a backup of the WAL files (Write Ahead Log). This method works by archiving WAL logs continuously in a secure location, where they can be used for PITR. WAL files provide atomicity and durability (two of the ACID properties) in database systems. It works by writing all modifications to a log before they are applied. This method recovers the data from the point-in-time recovery, thus using the WAL to retrieve the last version

PostgreSQL has several tools that help perform incremental, continuous and point-in-time-recovery from backup—for example, pg_basebackup, archive_command, restore_command and pg_archivecleanup.

There are several solutions for postgres backups that help take advantage of the numerous tools the platform provides for backup and recovery.

Pros and Cons of Each Postgres Backup Method

Both methods can be useful according to your organization backups needs, for example, SQL dumps are more suitable for small and medium databases while PITR is best suited for large databases. Let's review the pros and cons of both approaches.

SQL Dumps

This approach characterizes because of its granularity, making it a flexible backup regarding the database components. Some of its advantages and disadvantages are:

Pros

Compact—they only contain data and object creation statements. A dump reduces indexes to a simple line, using GZip to compress them. Moreover, you can choose the compression level.
Portable—dumps are platform agnostic, therefore you can restore the database in any operating system supported by PostgreSQL (Linux, Windows, MacOS). In addition, you can transform a Postgres dump to be readable by another database system by using the INSERT mode. You can create a dump during normal operations without needing to stop traffic to do it.
Flexible— you can create a dump from a table, a single database or a cluster. When restoring you can use custom formats and the pg_restore tool.
Consistent—through its proprietary Multi Version Concurrency Control and snapshot features, PostgreSQL ensures the dump is like a picture of the database at the time it started including any constraints and permissions.
Tests the database—PostgreSQL scans all the tables to create a dump, therefore if there are errors on the data pages, the program sends an alert.

Cons
Restoring from dumps is usually slower than other methods especially when created using aggressive parallelism. Databases are constantly changing, thus if you need to restore several hours after the last dump backup, you may risk losing data changes.

PIT strategy

Point in time recovery gives you the flexibility of recovering according to a timeline. It has advantages and disadvantages such as:

Pros

Fast copying—the speed depends on your network and disks, you can plan for the system to keep running during backup In addition, since restoring functions uses the same metrics you can predict how long it takes recovery in case of disaster.
Faster recovery—since there is no need to recalculate indexes. Restore involves replaying enough WAL to have a consistent database. You can replay all WAL, restoring to the closest time to the incident, keeping the data loss to the minimum.
Point-in-time recovery— you can tell PostgreSQL to stop replaying WAL st a given time. For example, giving it a timestamp to stop replayin WAL a second before an incident. Moreover, you can mark the start of an operation, and in case of disaster, restore until that mark.
Live database backup—this method requires you to perform the backup with a live database. Save from the disk reads and network processing, all the rest can be done remotely.

Cons

Not portable— you can only restore the database in the same PostgreSQL version, on the same operating system. For example, it is not possible to restore a Linux 64 bit origin on a Linux 32 bit.
No flexibility—you can only backup the entire cluster with all the databases, tables and objects. To ensure consistency, the WAL should be replayed to a minimum. Moreover, since the copy is not atomic, it means that the backup will be copied and restored as is, even with flaws and errors.

The Bottom Line

PostgreSQL offers two main approaches to backup, to address the different needs of its users. Which will you choose will depend on your backup and recovery needs. For example, if you need to be able to restore individual tables or databases, then SQL dump can be a good fit.

However, if your database is changing constantly, and you need to keep pace with time changes, then PITR could be the approach for you. Whichever you choose, you can leverage the benefits of streaming replication and incremental backups that are part of PostgreSQL.

Postgresql Database

Report

Enjoy this post? Give Eddie Segal a like if it's helpful.

Eddie Segal

Data Scientist

A flavoured data scientist, with 5 years of experince in data, half in data science. Recently engaged in developing Deep Learning based systems, with ho python (Tensorflow).

Discover and read more posts from Eddie Segal

get started