Skip to main content

Migrating or Upgrading PostgreSQL with Minimal Downtime

This guide can be used for upgrading PostgreSQL 12+ with minimal downtime or for migrating between different cloud services like Heroku PostgreSQL to AWS RDS.

note

This tutorial assumes both PostgreSQL databases are in the same network.

If they are not you'll need to set up a VPN tunnel between your networks during your migration.

A great option is tailscale. We'll be posting a tunneling runbook soon!

This runbook is based on this webinar.

There is an accompanying tutorial that runs in docker compose that can be used for practice.

Upgrading / Migrating Postgres with Logical Replication

PostgreSQL logical replication is a data synchronization method that enables the replication of individual database changes (such as inserts, updates, and deletes) in a fine-grained manner between two PostgreSQL databases. It uses a publish-subscribe model, allowing changes from a source database's specified tables to be captured, transformed into a logical representation, and then applied to target databases.

With logical replication there are two options for getting your existing data into the destination database:

  • Using copy_data = true with the replication subscription
  • Using copy_data = false and performing pg_dump/pg_restore

This tutorial will use the copy_data = false option and a pg_dump/pg_restore as it is more efficient for large databases.

This is the recommended approach for large data sets especially migrating or upgrading across networks. It can be time and resource consuming to replicate 100GB between something like Heroku and AWS RDS. This requires additional steps and brief downtime to prepare.

Pros:

  • Controlled Replication: You have control over when the replication process occurs. This can help you manage the impact on server resources.
  • Reduced Network Traffic: As replication is not continuous, there's less ongoing network traffic compared to real-time replication.

Cons:

  • Complexity: Manual dump and restore processes involve more steps and potential for errors, especially with large datasets.
  • Data Lag: Data replication is not immediate, resulting in data lag between the source and target databases.
tip

If you have a small database we suggest dump/restore if you can tolerate downtime, otherwise for small databases that cannot tolerate downtime the tutorial has a walkthrough of copy_data = true.

To perform logical replication we'll need to make sure the WAL level (wal_level) is set to logical.

The Write-Ahead Log (WAL) in PostgreSQL is a transaction log that records changes to the database in a sequential manner.

It serves as a reliable mechanism to ensure data durability, high availability, and crash recovery by allowing the replay of logged changes to reconstruct the database to a consistent state in the event of system failures.

Ensure that your WAL level is set to logical and that you have ample replication slots. 10 is a great setting for many upgrades/migrations.

postgresql.conf
wal_level = logical
max_replication_slots = 10

This whole upgrade/migration process is a few step that takes a matter of minutes not including your dump/restore time.

1. Put your application in Maintenance Mode

This step will depend on how you manage maintenance mode in your application.

To get started put your application into maintenance mode. This will be the first and longest of two downtimes (the latter will be brief as your application restarts w/ a new database connection string).

Its important to stop the application from receiving new requests because we don't want records written between the time the database dump is taken and the replication slot is created.

2. Use pg_dump to make a backup of your old database

Connect to your old database and run pg_dump:

Dump old database
pg_dump -U YOUR_USER -F t YOUR_DATABASE > YOUR_DATABASE-dump.tar

3. Create publication & replication slot on your old database

Connect to psql on your old database
psql -U YOUR_USER -d YOUR_DATABASE
Create publication on your old database
CREATE PUBLICATION pub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration FOR ALL TABLES;

FOR ALL TABLES will publish changes for all tables, you can also set specific tables for replication.

Create replication slot on your old database
SELECT pg_create_logical_replication_slot('sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration', 'pgoutput');

pgoutput is an output format for the pub/sub mechanism.

4. Bring your application out of maintenance mode.

This step will depend on how you manage maintenance mode in your application.

This is the end your first and longest downtime. Records will still be written to your old database, but also queued to be published to your replication slot.

5. pg_restore to your new database

Above we dumped our database to YOUR_DATABASE-dump.tar. This will need

Restore dump to your new database
pg_restore -d YOUR_DATABASE YOUR_DATABASE-dump.tar --no-owner --role=YOUR_ROLE -C -U YOUR_ROLE;

6. Create a paused subscription on your new database

Connect to psql on your new database
psql -U YOUR_ROLE -d YOUR_DATABASE;
Create a replication subscription from your new database
CREATE SUBSCRIPTION sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration 
CONNECTION 'dbname=YOUR_OLD_DATABASE host=YOUR_OLD_HOST user=YOUR_OLD_ROLE password=YOUR_OLD_PASSWORD'
PUBLICATION pub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration WITH (copy_data = false, create_slot=false, enabled=true, slot_name=sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration);

A few notes on the WITH options here:

  • copy_data - tells the subscription not to copy all existing data, this can take a very long time across networks
  • create_slot - will disable auto creation of the replication slot. We created it ahead in advanced.
  • enabled - enable the subscription to immediately start getting new records
  • slot_name - the replication slot to use

7. Check replication status between databases

You should now be able to run queries on your old database and new database. Results may be different as replication catches up.

You can estimate the replication time by getting a few log sequence numbers (LSN) from both databases:

Connect to your old database through psql and run the following to get the current WAL LSN:

Get current WAL LSN from old database
select pg_current_wal_lsn();

You'll get a result similar to:

pg_current_wal_lsn
--------------------
0/25A97D0

Connect to your new database through psql and run the following to get the most recently received LSN:

Get most recent LSN from new database
select * from  pg_stat_subscription;

You'll get a result similar to:

-[ RECORD 1 ]---------+------------------------------
subid | 26874
subname | sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration
pid | 1781
relid |
received_lsn | 0/25A97D0 <-- Last LSN received
last_msg_send_time | 2023-08-09 04:10:25.602431+00
last_msg_receipt_time | 2023-08-09 04:10:25.602842+00
latest_end_lsn | 0/25A97D0 <-- Last LSN reported back to publisher
latest_end_time | 2023-08-09 04:10:25.602431+00

If these values are the same, you are in sync!

They are likely not the same in a high scale system, so you can perform some basic math to see how far apart they are and get an estimate of when you should start database promotion.

Replication lag in bytes
-- OLD_DATABASE_CURRENT_WAL_LSN - NEW_DATABASE_RECEIVED_LSN
select '0/25A97D0'::pg_lsn - '0/25A97D0'::pg_lsn size_bytes;

This number should continue to go down, and based on the speed at which it does can give you an idea of how close to in sync your databases are.

8. Promote new database and cutover your application to the new database

The promotion process will introduce your next downtime. You'll want as minimal replication lag as possible.

The promotion process is pretty straightforward:

  1. Put any applications that write to your source database in maintenance mode. This will be your second, brief downtime.
  2. Check the replication and subscription status and drop the subscription
  3. Copy sequence data from the source to destination database
  4. Change the applications database connection info to the new database
  5. Bring your application back up

First, let's put the application in maintenance mode (DOWNTIME BEGINS)

Check the current WAL LSN on your old database:

Get current WAL LSN from old database
select pg_current_wal_lsn();

You'll get a result similar to:

pg_current_wal_lsn
--------------------
0/25A97D0

Check that the last LSN acknowledged matches on your new database:

Get the latest LSN reported from by new database
select latest_end_lsn from pg_stat_subscription where subname = 'sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration';
-- -[ RECORD 1 ]--+----------
-- latest_end_lsn | 0/25A97D0

Since traffic is stopped on your applications and no records are being written, this number should continue to decrease. Once they match, you can cut over.

Again, to check the number of bytes remaining:

Replication lag in bytes
-- OLD_DATABASE_CURRENT_WAL_LSN - NEW_DATABASE_RECEIVED_LSN
select '0/25A97D0'::pg_lsn - '0/25A97D0'::pg_lsn size_bytes;

Once the two values match you can drop the SUBSCRIPTION on your new database:

Drop subscription from new database
DROP SUBSCRIPTION sub_pg_AN_IDENTIFIER_FOR_YOUR_MIGRATION_migration;
caution

Important! Now you'll need to sync the sequence data between your old database and new database.

Sequences in PostgreSQL are database objects that provide a way to generate unique numeric values, often used for auto-incrementing primary keys in tables.

Sequences are not replicated with logical replication. Skipping this step will result potentially conflicting primary key values for auto incrementing keys.

Run the following command on both databases to see the differnces in sequence data:

Display sequence data on both databases
SELECT sequencename, last_value
FROM pg_sequences
ORDER BY last_value DESC NULLS LAST, sequencename;

Run the following command to dump your sequence data from your old database:

Dump old database sequence data
psql -h OLD_DB_HOST -U OLD_DB_USER -XAtqc 'SELECT $$select setval($$ || quote_literal(sequencename) || $$, $$ || last_value || $$); $$ AS sql FROM pg_sequences' OLD_DB_NAME > sequences.sql
Restore sequence data to new database
cat sequences.sql | psql -h NEW_DB_HOST -U NEW_DB_USER NEW_DB_NAME
note

If your application has materialized views, those will need to be manually refreshed as they are not automatically refreshed.

Its time to point the application at the upgraded database.

Change your application's connection on your diagram if using Massdriver or updated your connection string if you are not and bring your application back online.

🎉 Congratulations, you're done!