Setting Up a Patroni Cluster with VMware Postgres

VMware Postgres includes the Patroni High Availability (HA) solution. This topic describes how to set up an HA three-node cluster using Patroni and etcd. If a host becomes unavailable, this configuration provides failover to a standby host.

Patroni is a tool that can be used to create, manage, maintain, and monitor highly available clusters using Postgres streaming replication. Patroni handles the Postgres database initialization as well as planned switchovers or unplanned failovers.

This topic shows how to set up a three-node CentOS 7 HA cluster using Patroni and etcd. In this three-node cluster a Patroni instance on each node initializes and manages its own Postgres instance and monitors its health. Patroni uses health data to decide if any actions are required to keep the Patroni cluster healthy as a whole.

A single Patroni instance can acquire and keep the “leader lock” (primary role). This is where etcd comes in. etcd is a Distributed Consensus Store (DCS). It provides the distributed decision-making mechanism that determines which Patroni instance should be the leader and which instances should be standbys.

In our example, we will be creating a three-node etcd cluster. (Apache ZooKeeper can also be used with Patroni but will not be covered here.) The number of etcd nodes does not depend on the number of Patroni nodes in the cluster. However, you should have more than one etcd node to prevent a single point of failure, and it is best to have at least three etcd nodes to ensure a majority during leader voting.

About Postgres Replication

A Patroni HA cluster relies on Postgres replication to maintain the standby Postgres databases.

Postgres replication streams write-ahead logs (WAL) from a primary Postgres server to one or more standby Postgres servers. One Postgres server is designated the primary and one or more others are standbys.

Any database changes (DML operations) made in the primary server are first written to an in-memory WAL buffer. Changes are then immediately written to a WAL segment file when the transaction commits or aborts. The WAL segment files are in the primary cluster’s pg_wal directory.

After the WAL file is created, the primary cluster’s walsender process sends the file to the standbys’ walreceiver processes. Eacg standby’s walreceiver process writes the WAL data into WAL segment files in the standby’s pg_wal directory and informs the startup process. The startup process then replays the WAL data, much like a recovery process, and then the data is in sync with the primary.

The following ps output shows the walsender processes on the primary host, and the walreceiver and startup processes on a standby host.

Primary:

postgres 12391 12347  0 17:07 ?        00:00:00 postgres: patroni_test: walsender replicator 10.193.102.204(29906) streaming 0/4132360
postgres 14013 12347  0 19:05 ?        00:00:00 postgres: patroni_test: walsender replicator 10.193.102.205(33855) streaming 0/4132360

Standby:

postgres 11916 11914  0 17:07 ?        00:00:00 postgres: patroni_test: startup   recovering 0000000A0000000000000004
postgres 11928 11914  0 17:07 ?        00:00:09 postgres: patroni_test: walreceiver   streaming 0/4132360

Installing Prerequisite Software

Install and other required packages on all three CentOS 7 hosts. Log in as root, or use sudo to execute these commands.

  1. Download the RPM from VMware Tanzu Network and install it on all three nodes.

    # yum install vmware-postgres-11.7-3.el7.x86_64.rpm 
    
  2. Install the etcd and Patroni prerequisites on all three nodes.

    # sudo yum install -y python3 python3-devel gcc
    # sudo yum install -y etcd
    # sudo yum install -y python-psycopg2
    
    # cat << EOF > requirements.txt
    urllib3>=1.19.1,!=1.21
    boto
    PyYAML
    requests
    six >= 1.7
    kazoo>=1.3.1
    python-etcd>=0.4.3,<0.5
    python-consul>=0.7.0
    click>=4.1
    prettytable>=0.7
    tzlocal
    python-dateutil
    psutil>=2.0.0
    cdiff
    kubernetes>=2.0.0,<=7.0.0,!=4.0.*,!=5.0.*
    EOF
    
    # pip3 install -r requirements.txt
    
  3. Change the permissions on the /tmp directory.

    # chmod 775 /tmp
    

Configuring etcd

The method we use for setting up etcd is called static bootstrapping. This method requires knowledge of all cluster members’ IP addresses and that the addresses are static. The other bootstrapping method etcd offers is discovery bootstrapping. Discovery bootstrapping is only relevant when the etcd nodes are part of a container such as Kubernetes, which has its own special network setup. Discovery boostrapping is not covered in this topic.

The hosts in this document have IP addresses 10.193.102.203, 10.193.102.204, and 10.193.102.205. Substitute the actual IP addresses of your hosts in these instructions.

Follow these steps to set up the etcd cluster.

  1. Log in to the first node and become the postgres user.

    $ sudo su - postgres
    $ cd $HOME
    
  2. Create the etcd.yml file for the first node. This file is used to start the etcd cluster.

    $ vi /var/lib/pgsql/etcd.yml
    

    Insert the following contents and save the file.

    name: '<hostname1>'
    listen-peer-urls: 'http://10.193.102.203:2380'
    listen-client-urls: 'http://10.193.102.203:2379,http://127.0.0.1:2379'
    initial-advertise-peer-urls: 'http://10.193.102.203:2380'
    advertise-client-urls: 'http://10.193.102.203:2379'
    
    initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380'
    initial-cluster-state: 'new'
    initial-cluster-token: 'etcd-cluster-1'
    

    Where:

    • <hostname1>, <hostname2>, and <hostname3> are the hostnames of the three nodes.
    • 10.193.102.203 is the IP address of host 1.
    • 10.193.102.204 is the IP address of host 2.
    • 10.193.102.205 is the IP address of host 3.
  3. Start up etcd.

    $ etcd --config-file /var/lib/pgsql/etcd.yml > etcd_logfile 2>&1 &
    

    The background process should now be running. If it terminates, check the etcd_logfile for any errors. There could be an issue with the configuration file, or the 2379 or 2380 ports may already be in use.

  4. Log in to the second node (10.193.102.204) and become the postgres user.

    $ sudo su - postgres
    $ cd $HOME
    
  5. Create the etcd.yml configuration file for the second node.

    $ vi /var/lib/pgsql/etcd.yml
    

    Insert the following contents and save the file.

    name: '<hostname2>'
    listen-peer-urls: 'http://10.193.102.204:2380';
    listen-client-urls: 'http://10.193.102.204:2379,http://127.0.0.1:2379';
    initial-advertise-peer-urls: 'http://10.193.102.204:2380';
    advertise-client-urls: 'http://10.193.102.204:2379';
    
    initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380'
    initial-cluster-state: 'new'
    initial-cluster-token: 'etcd-cluster-1'
    

    Where:

    • <hostname1>, <hostname2>, and <hostname3> are the hostnames of the three nodes.
    • 10.193.102.203 is the IP address of host 1.
    • 10.193.102.204 is the IP address of host 2.
    • 10.193.102.205 is the IP address of host 3.
  6. Start etcd on the second node.

    $ etcd --config-file /var/lib/pgsql/etcd.yml > etcd_logfile 2>&1 &
    
  7. Log in to the third node (10.193.102.205) and become the postgres user.

    $ sudo su - postgres
    $ cd $HOME
    
  8. Create the etcd.yml configuration file for the third node.

    $ vi /var/lib/pgsql/etcd.yml
    

    Insert the following contents and save the file.

    name: '<hostname3>'
    listen-peer-urls: 'http://10.193.102.205:2380'
    listen-client-urls: 'http://10.193.102.205:2379,http://127.0.0.1:2379'
    initial-advertise-peer-urls: 'http://10.193.102.205:2380'
    advertise-client-urls: 'http://10.193.102.205:2379'
    
    initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380'
    initial-cluster-state: 'new'
    initial-cluster-token: 'etcd-cluster-1'
    

    Where:

    • <hostname1>, <hostname2>, and <hostname3> are the hostnames of the three nodes.
    • 10.193.102.203 is the IP address of host 1.
    • 10.193.102.204 is the IP address of host 2.
    • 10.193.102.205 is the IP address of host 3.
  9. Start etcd on the third node.

    $ etcd --config-file /var/lib/pgsql/etcd.yml > etcd_logfile 2>&1 &
    

Monitoring and Managing etcd

Use the etcdctl command-line utility to manage the etcd cluster. View the command help by entering the command with no arguments. More functionality is available if you set the ETDCTL_API environment variable to 3.

$ export ETCDCTL_API=3

Verify that the etcd Setup is Correct

$ etcdctl cluster-health
member 3172be2b0510829 is healthy: got healthy result from http://10.193.102.203:2379
member bf2f62dbbe5028b6 is healthy: got healthy result from http://10.193.102.204:2379
member e03eb2acfb7726f9 is healthy: got healthy result from http://10.193.102.205:2379
cluster is healthy

View and Manage etcd Cluster Members

With the etcdctl member command you can list the members of the etcd cluster and add, remove, or update members. With no arguments, the command displays a help message.

$ etcdctl member
NAME:
etcdctl member - member add, remove and list subcommands

USAGE:
etcdctl member command [command options] [arguments...]

COMMANDS:
    list    enumerate existing cluster members
    add     add a new member to the etcd cluster
    remove  remove an existing member from the etcd cluster
    update  update an existing member in the etcd cluster

OPTIONS:
--help, -h  show help

[gpadmin@mdw ~]$ etcdctl member
NAME:
etcdctl member - member add, remove and list subcommands

USAGE:
etcdctl member command [command options] [arguments...]

COMMANDS:
    list    enumerate existing cluster members
    add     add a new member to the etcd cluster
    remove  remove an existing member from the etcd cluster
    update  update an existing member in the etcd cluster

OPTIONS:
--help, -h  show help

This command lists the members and identifies the current Leader.

$ etcdctl member list
3172be2b0510829: name=postgres1 peerURLs=http://10.193.102.203:2380 clientURLs=http://10.193.102.203:2379 isLeader=false
bf2f62dbbe5028b6: name=postgres2 peerURLs=http://10.193.102.204:2380 clientURLs=http://10.193.102.204:2379 isLeader=true
e03eb2acfb7726f9: name=postgres3 peerURLs=http://10.193.102.205:2380 clientURLs=http://10.193.102.205:2379 isLeader=false

Configuring Patroni

Each host also has its own Patroni YAML file. This example provides the minimal settings for getting up and running with Patroni. If you want to be adventurous and try out different settings, you can check out all the options in the Patroni documentation.

  1. Log in to the first node and become the postgres user.

    $ sudo su - postgres
    $ cd $HOME
    
  2. Create the postgresql0.yml configuration file.

    $ vi postgresql0.yml
    

    Insert the following contents and save the file.

    scope: patroni_test
    name: postgresql0
    restapi:
      listen: '10.193.102.203:8008'
      connect_address: '10.193.102.203:8008'
    etcd:
      hosts: '10.193.102.203:2379,10.193.102.204:2379,10.193.102.205:2379'
    bootstrap:
      dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
            hot_standby: 'on'
            wal_keep_segments: 20
            max_wal_senders: 8
            max_replication_slots: 8
      initdb:
        - encoding: UTF8
        - data-checksums
      pg_hba:
        - host replication replicator 10.193.102.203/32 md5
        - host replication replicator 10.193.102.204/32 md5
        - host replication replicator 10.193.102.205/32 md5
        - host all all 0.0.0.0/0 md5
      users:
        admin:
          password: admin
          options:
            - createrole
            - createdb
    postgresql:
      listen: '10.193.102.203:5432'
      connect_address: '10.193.102.203:5432'
      data_dir: /var/lib/pgsql/data
      pgpass: /tmp/pgpass0
      authentication:
        replication:
          username: replicator
          password: rep-pass
        superuser:
          username: postgres
          password: postgres
        rewind:
          username: rewind_user
          password: rewind_password
      parameters:
        unix_socket_directories: .
    tags:
      nofailover: false
      noloadbalance: false
      clonefrom: false
      nosync: false
    

    Where:

    • postgresql0 is the name of this Patroni cluster member.
    • 10.193.102.203 is the IP address of host 1.
    • 10.193.102.204 is the IP address of host 2.
    • 10.193.102.205 is the IP address of host 3.
  3. Start Patroni.

    $ patroni postgresql0.yml > patroni_member_1.log 2>&1 &
    

    This command starts a Patroni process that initializes the Postgres database on the host. If the process terminates, check the patroni_member_1.log log file for errors. A successful startup message is similar to this:

    2020-03-17 23:32:58.648 UTC [10491] LOG:  database system is ready to accept connections
    10.193.102.222:5432 - accepting connections
    10.193.102.222:5432 - accepting connections
    2020-03-17 23:32:59,608 INFO: establishing a new patroni connection to the postgres cluster
    2020-03-17 23:32:59,631 INFO: running post_bootstrap
    2020-03-17 23:32:59,651 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
    2020-03-17 23:32:59,665 INFO: initialized a new cluster
    2020-03-17 23:33:09,645 INFO: Lock owner: postgresql0; I am postgresql0
    2020-03-17 23:33:09,668 INFO: Lock owner: postgresql0; I am postgresql0
    2020-03-17 23:33:09,684 INFO: no action.  i am the leader with the lock
    
  4. Log in to the new database. The password for the postgres user is postgres. (It was set in the postgresql section of the YAML file you created.)

    $ psql -h segment1 -U postgres
    Password for user postgres: 
    psql (11.6 (Pivotal Postgres 11.6.1))
    Type "help" for help.
    
    postgres=#
    
  5. Repeat steps 1 through 4 for the second and third nodes, with the following differences:

    • On the second node, the name of the YAML file is postgresql1.yml. In the file, set the value of the name attribute to postgresql1 and use the second node’s IP address in the listen and connect_address values for restapi and postgresql.

      name: postgresql1
      
      restapi:
        listen: 10.193.102.204:8008
        connect_address: 10.193.102.204:8008
      
      postgresql:
        listen: 10.193.102.204:5432
        connect_address: 10.193.102.204:5432
      

      Start Patroni with this command. bash $ patroni postgresql1.yml > patroni_member_2.log 2>&1 &

-   On the third node, the name of the YAML file is `postgresql2.yml`. In the file, set the value of the `name` attribute to `postgresql2` and use the third node's IP address in the `listen` and `connect_address` values for `restapi` and `postgresql`.

    ``` yaml
    name: postgresql2

    restapi:
      listen: 10.193.102.205:8008
      connect_address: 10.193.102.205:8008

    postgresql:
      listen: 10.193.102.205:5432
      connect_address: 10.193.102.205:5432
    ```
    Start Patroni with this command.
    ``` bash
    $ patroni postgresql2.yml > patroni_member_3.log 2>&1 &
    ```
  1. With Patroni started on all three nodes, use the patronictl command-line utility to view the cluster state.

    $ patronictl -c postgresql0.yml list
    +--------------+-------------+----------------+--------+---------+----+-----------+
    |   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 12 |         0 |
    | patroni_test | postgresql1 | 10.193.102.204 |        | running | 12 |         0 |
    | patroni_test | postgresql2 | 10.193.102.205 |        | running | 12 |         0 |
    +--------------+-------------+----------------+--------+---------+----+-----------+
    

Next Steps

The Patroni cluster is now set up.

  • To check that WAL replication is working, create a table on the leader and insert data into it. Verify that the table appears on each of the standby hosts.
  • Manually switch the leader to another node using this command.

    $ patronictl -c postgres1.yml switchover
    
  • Simulate an unplanned failover by rebooting any one of the hosts.