Upgrading VMware Postgres in a Patroni Cluster
This topic provides best practices and steps to perform a VMware Postgres upgrade in a Patroni cluster. The instructions help you perform a minor version upgrade, for example 11.7 to 11.8, by the fastest means possible, and with minimal to no downtime. Major version upgrades, for example 11.7 to 12.0, are not covered in this document.
Prerequisites
- This document provides instructions for RHEL-based distributions, such as Red Hat, CentOS, and Oracle Enterprise Linux. Commands for Debian-based distributions such as Ubuntu vary and are not covered here.
- The instructions assume that etcd is the DCS and the default ports were used when setting up your etcd cluster.
The instructions are tested with a HA cluster configured per Setting Up a Patroni Cluster with VMware Postgres.
PostgreSQL minor updates, such as 11.7 to 11.8, are binary updates only. The PostgreSQL community has an excellent reputation for making minor upgrades as painless as possible. However, always read the release notes for the new Postgres version, especially if you are skipping minor versions, such as 11.5 to 11.8.
Best Practices for Upgrading VMware Postgres
Restrict Updates
When updating Postgres packages and dependencies, it is best to focus on only the necessary packages. Running yum -y update
is not recommended because there may be many available updates, lengthening the upgrade process. Yum updates may also include a new kernel, requiring a reboot.
pgbouncer
If you have set up pgbouncer running on a single Postgres host, pause pgbouncer before you upgrade .
Following is the recommended upgrade process when you are using pgbouncer.
Connect to pgbouncer and execute the
PAUSE
command to suspend all future client connections.Note: You must maintain the connection to pgbouncer after issuing
PAUSE
to ensure that pgbouncer does not un-pause.Start a second terminal and wait for all existing client queries to complete.
Create a backup of the cluster using your preferred method.
Upgrade the Patroni Postgres cluster, as described in Upgrading VMware Postgres with Patroni.
Back in the first terminal, execute the pgbouncer
RESUME
command.
HAProxy
If you have set up HAProxy, it is important to validate your HAproxy configuration and operation prior to starting the upgrade. If the read-write pool for HAproxy is not functioning properly clients will lose connectivity to the primary database at some point during the upgrade.
Upgrading VMware Postgres with Patroni
This is the general process for upgrading a Patroni/etcd Postres cluster:
- Suspend Patroni failover.
- Execute the upgrade steps for a single Postgres node on each standby Postgres node.
- Resume Patroni failover.
- Manually switch the Postgres primary to an upgraded standby node.
- Suspend Patroni failover.
- Execute the upgrade steps for a single Postgres node on the former primary node.
- Resume Patroni failover.
- Optionally, switch the Postgres primary back to the original node.
Upgrading Standby Nodes
When upgrading , always begin with the standby nodes. It helps to identify problems with the upgrade without affecting the primary node. Do not upgrade the primary node until the standby nodes are upgraded and validated. If you encounter any problems while performing the upgrade, stop and debug.
Procedure:
Download the RPM for the new version of from VMware Tanzu Network and copy it to all of the nodes.
Disable Patroni automatic failure. This enables Patroni maintance mode.
$ patronictl -c postgresql0.yml pause
Success: cluster management is paused
$ patronictl -c postgresql0.yml list
+--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 2 | | | patroni_test | postgresql1 | 10.193.102.204 | | running | 2 | 0 | | patroni_test | postgresql2 | 10.193.102.205 | | running | 2 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+ Maintenance mode: on
Wait for all client queries to finish. You can monitor with
ps -ef | grep postgres
, or use the following database query to find active queries.$ psql -d postgres -c "SELECT * FROM pg_stat_activity WHERE state='action' and backend_type != 'walsender' AND pid != pg_backend_pid();"
Log in to the first standby node and stop the Postgres server.
# sudo su - postgres $ pg_ctl -D /var/lib/pgsql/data stop -m fast
waiting for server to shut down.... done server stopped
Run the
yum update
command to upgrade .# sudo yum update -y vmware-postgres-11.7.4.el7.x86_64.rpm
Repeat steps 4 and 5 for the remaining standby nodes.
Log in to the primary host and view the cluster status.
$ patronictl -c postgresql0.yml list +--------------+-------------+---------------+---------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+---------------+---------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 2 | | | patroni_test | postgresql1 | 10.193.102.204 | | stopped | | unknown | | patroni_test | postgresql2 | 10.193.102.205 | | stopped | | unknown | +--------------+-------------+----------------+--------+---------+----+-----------+ Maintenance mode: on
Resume Patroni failover.
$ patronictl -c postgresl0.yml resume
Success: cluster management is resumed
$ patronictl -c postgres0.yml list
+--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 2 | | | patroni_test | postgresql1 | 10.193.102.204 | | running | 2 | 0 | | patroni_test | postgresql2 | 10.193.102.205 | | running | 2 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+
Upgrading the Primary Node
After the standby nodes are upgraded and verified, you can upgrade the current primary node.
Switch the primary to one of the upgraded Postgres instances.
$ patronictl -c postgresql0.yml switchover Master [postgresql0]: Candidate ['postgresql1', 'postgresql2'] []: postgresql1 When should the switchover take place (e.g. 2020-07-30T15:57 ) [now]: now
Current cluster topology +--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 2 | | | patroni_test | postgresql1 | 10.193.102.204 | | running | 2 | 0 | | patroni_test | postgresql2 | 10.193.102.205 | | running | 2 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+ Are you sure you want to switchover cluster patroni_test, demoting current master postgresql0? [y/N]: y
2020-07-30 14:58:11.56393 Successfully switched over to "postgresql1" +--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | | stopped | | unknown | | patroni_test | postgresql1 | 10.193.102.204 | Leader | running | 2 | | | patroni_test | postgresql2 | 10.193.102.205 | | running | 2 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+
Again, pause Patroni failover so that Patroni will not try to restart Postgres during the upgrade.
$ patronictl -c postgresql0.yml pause
Success: cluster management is paused
Stop the Postgres server on the now former primary node.
$ pg_ctl -D /var/lib/pgsql/data stop -m fast
waiting for server to shut down.... done server stopped
Run the
yum update
command to upgrade .# sudo yum update -y vmware-postgres-11.7.4.el7.x86_64.rpm
Resume Patroni failover.
$ patronictl -c postgresql0.yml resume
Success: cluster management is resumed
$ patronictl -c postgres0.yml list
+--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | | running | 2 | | | patroni_test | postgresql1 | 10.193.102.204 | Leader | running | 2 | 0 | | patroni_test | postgresql2 | 10.193.102.205 | | running | 2 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+
(Optional) Switch the primary back to the original primary node.
$ patronictl -c postgresql0.yml switchover Master [postgresql1]: Candidate ['postgresql0', 'postgresql2'] []: postgresql0 When should the switchover take place (e.g. 2020-07-31T10:14 ) [now]: Current cluster topology +--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | | running | 3 | 0 | | patroni_test | postgresql1 | 10.193.102.204 | Leader | running | 3 | | | patroni_test | postgresql2 | 10.193.102.205 | | running | 3 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+ Are you sure you want to switchover cluster patroni_test, demoting current master postgresql1? [y/N]: y 2020-07-31 09:14:39.12718 Successfully switched over to "postgresql0" +--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 3 | | | patroni_test | postgresql1 | 10.193.102.204 | | stopped | | unknown | | patroni_test | postgresql2 | 10.193.102.205 | | running | 3 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+
When all nodes are running, the upgrade is complete.