Careers at Triumph Tech

How to Migrate Large Heroku Postgres Databases to RDS Using WAL-G with Minimal Downtime

How to Migrate Large Heroku Postgres Databases to RDS Using WAL-G with Minimal Downtime

Learn how to Migrate Large Heroku Databases to RDS Using WAL-G with Minimal Downtime.

Heroku is a platform as a service (PaaS) enabling developers to build, run, and operate applications entirely in the cloud. Triumph Technology Solutions LLC, an AWS Consulting Partner and dedicated provider of AWS-centered cloud services, Triumph Tech successfully assisted a customer with migrating a Heroku application to an RDS PostgreSQL Database. This post is meant to outline in detail the steps necessary to execute this migration.

It is worth noting that PostgreSQL is an advanced open-source relational database offered by both AWS, RDS and Amazon Aurora. In this case RDS was leveraged, with its advantages including but not limited to automated provisions, continuous backups and restores, monitoring dashboards, read replicas, Multi-AZ for disaster recovery, maintenance windows for upgrades, and scaling capabilities.

To perform this migration, Triumph Tech first created the log shipped replica from Heroku WAL Logs, and then created the log streamed replica from EC2. Triumph Tech then migrated the Heroku application to the EC2 PostgreSQL database. Triumph then created the database in RDS and initiated the RDS replication, which in turn enabled the migration of the Heroku application to the RDS PostgreSQL database. This sequence will be further detailed below.

Creating the Log Shipped Replica from Heroku WAL Logs

Triumph Tech ran an EC2.yaml CloudFormation template to deploy appropriate EC2 servers. In this case the Testnet was r5.xlarge with 250GB EBS and 2000 PIOPS, and the Prodnet was r5ad.24xlarge with 5TB and 16,000 PIOPS.

Triumph Tech then used the SSM Session Manager to connect to the instance appropriately named “–postgres-master”. Triumph Tech then imported the base backup from Heroku S3 and waited for this to finish. This process can take a relatively long time depending on the size of the database. The appropriate commands are as follows.

Note that you will have to contact heroku support in order to to have the WAL logs and Base Backup made available to you in s3.  

Step 1:

sudo -su postgres
nohup ./usr/local/bin/wal-g-fetch-backup.sh &

Step 2:
With the below commands we configured PostgreSQL to start in recovery and standby mode.

touch /database/recovery.signal
touch /database/standby.signal
rm /database/pg_wal
ln -s /wal/pg_xlog /database/pg_wal

Step 3:
PostgreSQL was started with the command sudo /etc/init.d/postgresql start

Step 4:
Triumph Tech was then able to monitor logs to observe the progress of the Recovery/Replication process using the below commands:

tail -f /var/log/postgresql/postgresql-12-main.log
tail -f /tmp/wal.log

Step 5:
Triumph Tech was able to monitor the replication lag with this query:

select now()-pg_last_xact_replay_timestamp() as replication_lag;

At this point Triumph Tech had created the log shipped replica from Heroku WAL Logs, and was ready to create the log streamed replica from EC2, which you’ll see in the next section is a similar process to the one just described.

Creating the Log Streamed Replica from EC2
Triumph Tech used SSM Session Manager to connect to each replica instance named “–postgres-replica1-5,” and then imported the base backups from EC2 Log-Shipped replica which list importing the base backups from Heroku S3 can take a relatively long time depending on the size of the database. The appropriate commands are as follows.

Step 1:

sudo -su postgres
nohup ./usr/local/bin/physical-backup.sh &

Step 2:
PostgreSQL was configured to start in standby mode:

touch /database/standby.signal
rm -rf /database/pg_wal
ln -s /wal/pg_xlog /database/pg_wal

Step 3:
PostgreSQL was started with the command sudo /etc/init.d/postgresql start

Triumph Tech was then able to monitor logs to observe the progress of the Recovery/Replication process using the below commands:

tail -f /var/log/postgresql/postgresql-12-main.log
tail -f /tmp/wal.log

Step 4:
Triumph Tech was able to monitor the replication lag with this query:

select now()-pg_last_xact_replay_timestamp() as replication_lag;

At this point Triumph Tech had created the log shipped replica from Heroku WAL Logs, and also the log streamed replica was created. Triumph Tech was then ready to migrate the Heroku application to the EC2 PostgreSQL database.

Migrating the Heroku Application to the EC2 PostgreSQL Database

To proceed, Triumph Tech needed to place the Heroku application into maintenance mode using the following command: heroku maintenance:on -a app-name

Triumph Tech then changed the database connection string to point to the new master server and replica(s) on EC2. Doing so required Triumph Tech to wait a few minutes for the replication to complete. Once it was complete, Triumph Tech used the below commands to promote the EC2 Log-Shipped Replica to Master.

sudo -su postgres
pg_ctlcluster 12 main promote

At this point Triumph Tech was ready to restart the Heroku application and take it out of maintenance with the following commands:

heroku ps:restart -a app-name
heroku maintenance:off -a app-name

Now, Triumph Tech was ready to build the database in RDS.

Creating the Database in RDS

Step 1:

Sign in to the AWS console and find the RDS service. Selecting the RDS service will take you to the Amazon RDS dashboard. From the dashboard, select “Create database.” You’ll want to make sure “Standard Create” and “PostgreSQL” are selected before you continue configuring settings. This can be seen directly below.

Step 2:

Using the drop down menu, select the most recent version of PostgreSQL.

Step 3:

For this step, give the database instance a name and create a username and password

Step 4:

Make sure the “DB instance size” and “Storage”

In this case the Testnet was db.r5.xlarge with 250GB EBS and 2000 PIOPS, and the Prodnet was db.r5.24xlarge with 5TB and 16,000 PIOPS. Note that these characteristics are reflective of the EC2 instances launched in the first section where Triumph Tech created the log shipped replica from Heroku WAL logs.

Step 5:

Again, make sure the “Availability & durability” and “Connectivity” default settings are selected.

Step 6:

Within “Additional connectivity configuration” under “Publicly accessible”, select Yes. This is an important step to get right.

Step 7:

Make sure the default settings for “Database authentication” and “Additional configuration” are selected. Once that is verified, we’re ready to create the database

Step 8:

Be sure to setup your security group rules appropriately before attempting to connect to your RDS instance.

Initiating RDS Replication

Step 1:
Triumph Tech then connected to the EC2 Master and created a replication user with the below commands:

sudo -su postgres
psql -d TABLE_NAME
CREATE ROLE rds_replication REPLICATION LOGIN SUPERUSER PASSWORD 'REPLICATION SECRET';
ALTER USER postgres PASSWORD 'EC2 POSTGRES SECRET';
CREATE PUBLICATION <company>_pub FOR ALL TABLES;

Step 2:
A schema-only export of the database was performed:

pg_dump --file schema.bak --port "5432" --username "postgres" --no-password --verbose --format=c --blobs --schema-only "dein9rsa91skmi"

Step 3:
Triumph Tech then created a blank database on the RDS PostgreSQL with the appropriate users and roles:

psql -h <company>-testnet.<confidential>.us-east-1.rds.amazonaws.com
CREATE ROLE collectd WITH
  LOGIN
  NOSUPERUSER
  INHERIT
  NOCREATEDB
  NOCREATEROLE
  NOREPLICATION;

CREATE ROLE "long-queries" WITH
  LOGIN
  NOSUPERUSER
  INHERIT
  NOCREATEDB
  NOCREATEROLE
  NOREPLICATION
  ENCRYPTED PASSWORD '<confidential>';

ALTER ROLE "long-queries" SET statement_timeout TO '750000';

CREATE ROLE "read-only" WITH
  LOGIN
  NOSUPERUSER
  INHERIT
  NOCREATEDB
  NOCREATEROLE
  NOREPLICATION
  PASSWORD 'READ ONLY SECRET';

CREATE ROLE "readonly-shard" WITH
  LOGIN
  NOSUPERUSER
  INHERIT
  NOCREATEDB
  NOCREATEROLE
  NOREPLICATION
  PASSWORD 'READ ONLY SECRET';

CREATE ROLE <role name> WITH
  LOGIN
  NOSUPERUSER
  INHERIT
  NOCREATEDB
  NOCREATEROLE
  NOREPLICATION
  CONNECTION LIMIT 500
  PASSWORD 'APP USER SECRET';

GRANT <role name> to postgres;

 CREATE DATABASE <name>
  WITH
  OWNER = <role name>
  ENCODING = <confidential>
  LC_COLLATE = <confidential>
  LC_CTYPE = <confidential>
  CONNECTION LIMIT = -1;

Step 4:
Triumph Tech restored the database schema to the RDS server:

pg_restore --host "<company>-test-rds.<confidential>.us-east-1.rds.amazonaws.com" --port "XXXX" --username "postgres" --dbname "<anonymous>" --schema-only --verbose schema.bak

Step 5:
Triumph Tech created a subscription on RDS with the below:

psql -h "<company>-test-rds.<confidential>.us-east-1.rds.amazonaws.com"  -d
<anonymous>
CREATE SUBSCRIPTION <company>_sub
CONNECTION 'host=<EC2 MASTER PRIVATE IP> port=XXXX dbname=<anonymous>
user=rds_replication password=<RDS REPLICATION SECRET>'
PUBLICATION <company>_pub;

At this point the migration was complete and Triumph Tech was ready to migrate the Heroku application to the RDS PostgreSQL database.

Migrating the Heroku Application to the RDS PostgreSQL Database

Triumph Tech placed the Heroku application into maintenance with the following command: heroku maintenance:on -a app-name

Triumph Tech then changed the database connection string to point to the new master server and replica(s) on RDS. Note that it can take a few minutes for the replication to complete. Once complete, we connected to the EC2 master:

sudo -su postgres
psql -h <company>-testnet.<confidential>.us-east-1.rds.amazonaws.com -d
<anonymous>
DROP SUBSCRIPTION <company>_sub;```

Triumph Tech then created a sequence script and saved it as alter_sequences.sql. With the below command Triumph Tech was able to recreate sequences from the script:

psql -h <company>-testnet.<confidential>.us-east-1.rds.amazonaws.com -d <anonymous> -f alter_sequences.sql

Triumph Tech was finally ready to restart the Heroku application and take it out of maintenance:

heroku ps:restart -a app-name
heroku maintenance:off -a app-name

You now have learned how to Migrate Large Heroku Databases to RDS Using WAL-G!






View more articles  View more articles 

Leave a Reply

Your email address will not be published. Required fields are marked *