Upgrading VMware Postgres in a Patroni Cluster

This topic provides best practices and steps to perform a VMware Postgres upgrade in a Patroni cluster. The instructions help you perform a minor version upgrade, for example 11.7 to 11.8, by the fastest means possible, and with minimal to no downtime. Major version upgrades, for example 11.7 to 12.0, are not covered in this document.

Prerequisites

  • This document provides instructions for RHEL-based distributions, such as Red Hat, CentOS, and Oracle Enterprise Linux. Commands for Debian-based distributions such as Ubuntu vary and are not covered here.
  • The instructions assume that etcd is the DCS and the default ports were used when setting up your etcd cluster.

The instructions are tested with a HA cluster configured per Setting Up a Patroni Cluster with VMware Postgres.

PostgreSQL minor updates, such as 11.7 to 11.8, are binary updates only. The PostgreSQL community has an excellent reputation for making minor upgrades as painless as possible. However, always read the release notes for the new Postgres version, especially if you are skipping minor versions, such as 11.5 to 11.8.

Best Practices for Upgrading VMware Postgres

Restrict Updates

When updating Postgres packages and dependencies, it is best to focus on only the necessary packages. Running yum -y update is not recommended because there may be many available updates, lengthening the upgrade process. Yum updates may also include a new kernel, requiring a reboot.

pgbouncer

If you have set up pgbouncer running on a single Postgres host, pause pgbouncer before you upgrade .

Following is the recommended upgrade process when you are using pgbouncer.

  1. Connect to pgbouncer and execute the PAUSE command to suspend all future client connections.

    Note: You must maintain the connection to pgbouncer after issuing PAUSE to ensure that pgbouncer does not un-pause.

  2. Start a second terminal and wait for all existing client queries to complete.

  3. Create a backup of the cluster using your preferred method.

  4. Upgrade the Patroni Postgres cluster, as described in Upgrading VMware Postgres with Patroni.

  5. Back in the first terminal, execute the pgbouncer RESUME command.

HAProxy

If you have set up HAProxy, it is important to validate your HAproxy configuration and operation prior to starting the upgrade. If the read-write pool for HAproxy is not functioning properly clients will lose connectivity to the primary database at some point during the upgrade.

Upgrading VMware Postgres with Patroni

This is the general process for upgrading a Patroni/etcd Postres cluster:

  1. Suspend Patroni failover.
  2. Execute the upgrade steps for a single Postgres node on each standby Postgres node.
  3. Resume Patroni failover.
  4. Manually switch the Postgres primary to an upgraded standby node.
  5. Suspend Patroni failover.
  6. Execute the upgrade steps for a single Postgres node on the former primary node.
  7. Resume Patroni failover.
  8. Optionally, switch the Postgres primary back to the original node.

Upgrading Standby Nodes

When upgrading , always begin with the standby nodes. It helps to identify problems with the upgrade without affecting the primary node. Do not upgrade the primary node until the standby nodes are upgraded and validated. If you encounter any problems while performing the upgrade, stop and debug.

Procedure:

  1. Download the RPM for the new version of from VMware Tanzu Network and copy it to all of the nodes.

  2. Disable Patroni automatic failure. This enables Patroni maintance mode.

    $ patronictl -c postgresql0.yml pause
    
    Success: cluster management is paused 
    
    $ patronictl -c postgresql0.yml list 
    
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running |  2 |           |
    | patroni_test | postgresql1 | 10.193.102.204 |        | running |  2 |         0 |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  2 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
     Maintenance mode: on
    
  3. Wait for all client queries to finish. You can monitor with ps -ef | grep postgres, or use the following database query to find active queries.

    $ psql -d postgres -c "SELECT * FROM pg_stat_activity 
        WHERE state='action' and backend_type != 'walsender' 
        AND pid != pg_backend_pid();"
    
  4. Log in to the first standby node and stop the Postgres server.

    # sudo su - postgres
    $ pg_ctl -D /var/lib/pgsql/data stop -m fast
    
    waiting for server to shut down.... done
    server stopped
    
  5. Run the yum update command to upgrade .

    # sudo yum update -y vmware-postgres-11.7.4.el7.x86_64.rpm
    
  6. Repeat steps 4 and 5 for the remaining standby nodes.

  7. Log in to the primary host and view the cluster status.

    $ patronictl -c postgresql0.yml list
    +--------------+-------------+---------------+---------+---------+----+-----------+
    |   Cluster    |    Member   |      Host     |  Role   |  State  | TL | Lag in MB |
    +--------------+-------------+---------------+---------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running |  2 |           |
    | patroni_test | postgresql1 | 10.193.102.204 |        | stopped |    |   unknown |
    | patroni_test | postgresql2 | 10.193.102.205 |        | stopped |    |   unknown |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    Maintenance mode: on
    
  8. Resume Patroni failover.

    $ patronictl -c postgresl0.yml resume
    
    Success: cluster management is resumed
    
    $ patronictl -c postgres0.yml list
    
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running |  2 |           |
    | patroni_test | postgresql1 | 10.193.102.204 |        | running |  2 |         0 |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  2 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    

Upgrading the Primary Node

After the standby nodes are upgraded and verified, you can upgrade the current primary node.

  1. Switch the primary to one of the upgraded Postgres instances.

    $ patronictl -c postgresql0.yml switchover
    Master [postgresql0]:
    Candidate ['postgresql1', 'postgresql2'] []: postgresql1
    When should the switchover take place (e.g. 2020-07-30T15:57 )  [now]: now
    
    Current cluster topology
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running |  2 |           |
    | patroni_test | postgresql1 | 10.193.102.204 |        | running |  2 |         0 |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  2 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    Are you sure you want to switchover cluster patroni_test, demoting current master postgresql0? [y/N]: y
    
    2020-07-30 14:58:11.56393 Successfully switched over to "postgresql1"
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 |        | stopped |    |   unknown |
    | patroni_test | postgresql1 | 10.193.102.204 | Leader | running |  2 |           |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  2 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    
  2. Again, pause Patroni failover so that Patroni will not try to restart Postgres during the upgrade.

    $ patronictl -c postgresql0.yml pause
    
    Success: cluster management is paused
    
  3. Stop the Postgres server on the now former primary node.

    $ pg_ctl -D /var/lib/pgsql/data stop -m fast
    
    waiting for server to shut down.... done
    server stopped
    
  4. Run the yum update command to upgrade .

     # sudo yum update -y vmware-postgres-11.7.4.el7.x86_64.rpm
    
  5. Resume Patroni failover.

    $ patronictl -c postgresql0.yml resume
    
    Success: cluster management is resumed
    
    $ patronictl -c postgres0.yml list
    
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 |        | running |  2 |           |
    | patroni_test | postgresql1 | 10.193.102.204 | Leader | running |  2 |         0 |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  2 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    
  6. (Optional) Switch the primary back to the original primary node.

    $ patronictl -c postgresql0.yml switchover
    Master [postgresql1]:
    Candidate ['postgresql0', 'postgresql2'] []: postgresql0
    When should the switchover take place (e.g. 2020-07-31T10:14 )  [now]:
    Current cluster topology
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 |        | running |  3 |         0 |
    | patroni_test | postgresql1 | 10.193.102.204 | Leader | running |  3 |           |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  3 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    Are you sure you want to switchover cluster patroni_test, demoting current master postgresql1? [y/N]: y
    2020-07-31 09:14:39.12718 Successfully switched over to "postgresql0"
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running |  3 |           |
    | patroni_test | postgresql1 | 10.193.102.204 |        | stopped |    |   unknown |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running |  3 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    

When all nodes are running, the upgrade is complete.