VMware Postgres High Availability with pg_auto_failover
VMware Postgres includes pg_auto_failover, an open source extension for PostgreSQL. pg_auto_failover facilitates the creation of a High Availability (HA) configuration, monitors replication between Postgres instances, and manages automatic failover for a group of Postgres nodes.
pg_auto_failover was introduced in VMware Postgres 10.16.1, and replaces Patroni for all future VMware Postgres releases.
This page describes:
- Creating a High Availability cluster using pg_auto_failover.
- Migrating from Patroni High Availability to pg_auto_failover for existing Patroni customers who wish to move to a VMware Postgres release that supports pg_auto_failover for HA.
The pg_auto_failover architecture requires three key components as a minimum:
- a pg_auto_failover monitor node that acts both as a witness and an orchestrator
- a Postgres primary node
- a Postgres secondary node, using by default a synchronous hot standby setup
This architecture guarantees availability of the Postgres service to users and applications, while automating maintenance operations.
For more details on the key architecture concepts of pg_auto_failover, see Architecture Basics in the pg_auto_failover documentation.
pg_auto_failover is included with the VMware Postgres release, and installed during the
rpm package installation. For details see Installing the Postgres Server RPM.
Install the VMware Postgres release on all three nominated hosts, the monitor, primary, and secondary.
pg_auto_failover uses the
pg_autoctl utility with subcommands to initialize, manage, and orchestrate the replicated environment. Login to each host, and confirm the pg_auto_failover installation by using:
$ pg_autoctl --version
For the pg_autoctl command reference see Main pg_autoctl commands.
Prior to creating the HA architecture in your environment, review the important decision points discussed in Configuring pg_auto_failover.
pg_auto_failover relies on the creation of a monitor service that manages one or several HA formations. A formation can contain a group of two or more Postgres nodes.
In the following examples, replace the example hostnames
HA-SECONDARY with your network’s fully qualified hostnames.
Create the monitor service, similar to:
$ export PGDATA=/monitor $ pg_autoctl create monitor --no-ssl --auth trust --run
PGDATA=/monitoris set here for example clarity. Alter the
PGDATAenvironment variable to reflect the data location on your monitor host.
IMPORTANT: This example creates a monitor in a test environment with no security restrictions (
--auth trust), which is not a recommended practice for production environments.
Add the new primary and secondary nodes to the formation.
On the monitor node, obtain the monitor connection URI, which is used to add the primary and the secondary nodes to the formation.
$ pg_autoctl show uri --monitor postgres://autoctl_node@MONITOR-HOST/pg_auto_failover?sslmode=prefer
Login to the primary node, and bring up a Postgres instance, adding it to the HA formation:
$ pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --pgdata /primary_data \ --monitor postgres://autoctl_node@MONITOR-HOST/pg_auto_failover?sslmode=prefer \ --run
NOTE: When creating the instance, ensure that the
$PGDATAdirectory is not a mount point, and that it can be recreated and deleted by the pg_autofailover user and Postgres. If
$PGDATAis a mountpoint for a filesystem, change it to be a directory in the mounted filesystem. For example: if a filesystem is mounted to
$PGDATAis set to
Login to the secondary node, which does not contain any database files and is not initialized, and add it to the formation, similar to:
$ pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --pgdata /secondary_data \ --monitor postgres://autoctl_node@MONITOR-HOST/pg_auto_failover?sslmode=prefer \ --run
On the monitor, confirm the primary and secondary node architecture, similar to:
$ pg_autoctl show state --pgdata monitor Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State -------+-------+-------------------+-----------+-----------+-------------------+------------------ node_1 | 1 | HA-PRIMARY:5432 | 0/3000060 | yes | primary | primary node_2 | 2 | HA-SECONDARY:5432 | 0/3000060 | yes | secondary | secondary
For further details, refer to pg_autoctl show state.
Verify the default formation and the postgres uri used to connect to the pg_auto_failover nodes.
$ pg_autoctl show uri Type | Name | Connection String -----------+---------+------------------------------- monitor | monitor | postgres://autoctl_node@MONITOR-HOST:5432/pg_auto_failover?sslmode=prefer formation | default |
For details see pg_autoctl show uri.
For examples of failover scenarios, refer to the pg_auto_failover documentation.
This topic describes the migration process from an existing Tanzu Postgres Patroni High Availability (HA) solution, to pg_auto_failover HA.
pg_auto_failover is the only supported HA solution from VMware Postgres 10.16.1.
IMPORTANT: Moving from a Patroni HA configuration to pg_auto_failover requires system downtime.
- Create new 10.18 nodes and move data from the old primary to the new.
- Create the pg_auto_failover monitor and add the new primary and secondary to the formation.
- Adjust the pg_auto_failvoer settings to match your old Patroni environment.
The documented migration process is based on a two node Patroni HA installation example, using VMware Postgres 10.15.0 instances. These prerequisites apply to the example scenario, so you must adjust the requirements to suit your environment. To upgrade a two-node VMware Postgres installation you require:
- A pre-arranged maintenance window to perform the migration. During that period, stop any client access to the VMware Postgres Patroni cluster, to avoid any data modification.
- Three newly provisioned hosts for the pg_auto_failover architecture: the monitor, the primary, and the secondary. The new hosts should have network access to the old hosts, and adequate data storage capacity to host the existing Postgres 10.15.0 databases.
- Database superuser admin access to old and new hosts.
- Familiarity with Postgres tools like
- Familiarity with the existing Patroni architecture and some familiarity with pg_auto_failover concepts and architecture.
Create new nodes and move the old data
(Optional) Take a backup of the Patroni cluster primary node, to ensure you can recover the initial state of the 10.15.0 database node.
(Optional) Save your Patroni configuration files on your local client. These settings will be used as a reference to configure your new pg_auto_failover environment.
Stop the Patroni cluster and the VMware Postgres 10.15.0 instances.
Install the VMware Postgres 10.18.0 release on each of the new nodes: primary, secondary, and pg_auto_failover monitor. For details see Installing the Postgres Server RPM.
Initialize and start the 10.18.0 VMware Postgres instance on the new primary node.
$ initdb -D /new_primary_data
/new_primary_datais the new database file location. Replace with your own values.
$ pg_ctl -D /new_primary_data start
From the new primary node, take a copy of the 10.15.0 data files using
pg_dumpallwith a connection URI similar to:
$ pg_dumpall --dbname=postgres://<user>@<IP-address-old-primary>:<port-old-primary>/postgres?sslmode=prefer -f /home/postgres/primary_data.sql
where the URI protocol notation is in the format of
<user>@<host>:<port>/<dbname>?optionkey=optionvalue. For more details on the URI parameters, refer to Connection URIs in the PostgreSQL documentation.
For more details on pg_dumpall, refer to the PostgreSQL pg_dumpall documentation.
Using your preferred copy method (for example,
scp), copy the
/home/postgres/primary_data.sqlfile from the old primary node to the new primary. In this example, we used
/tmp/new_primary_data.sqlas the target location in the new primary.
On the new primary node, use
psqlto restore the data files, similar to:
$ psql postgres://<user>@<IP-of-new-primary>:<port of new primary>/postgres?sslmode=prefer -f /tmp/new_primary_data.sql
/tmp/new_primary_data.sqlis the source file for the restore.
Inspect the data by running test queries.
Bring down the new 10.18.0 instance.
Create the pg_auto_failover monitor and add the new nodes to the formation
The following examples do not use any strict security settings. Use them only as guidelines.
On the new monitor host, create the pg_auto_failover monitor:
$ pg_autoctl create monitor \ --auth trust \ --ssl-self-signed \ --pgdata /monitor \ --run
Add the new primary node to the HA formation. Login to the new primary host, and execute
pg_autoctl create postgressimilar to:
$ pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --monitor postgres://autoctl_node@MONITOR-HOST/pg_auto_failover?sslmode=prefer \ --run
MONITOR-HOSTis the monitor node example hostname, and 5000 is the example port the monitor instance is running on.
Login to the secondary node, which does not contain any database files and is not initialized, and add it to the HA formation, similar to:
$ pg_autoctl create postgres \ --auth trust \ --ssl-self-signed \ --pgdata /new_secondary_data \ --monitor postgres://autoctl_node@MONITOR-HOST/pg_auto_failover?sslmode=prefer \ --run
Verify the state of the configuration, using
pg_autoctl show state.
Create a test table on the new primary node and verify that it appears on the secondary node.
Adjust the pg_auto_failvoer settings to match your old Patroni environment, where applicable. Patroni configuration parameters do not map one to one with pg_auto_failover setup parameters.
Establish client connectivity to the new pg_auto_failover cluster and test your client applications.
When all tests are complete, you may drop the old hosts and their data.