PostgreSQL automatic failover with pg_auto_failover

PostgreSQL is a powerful, open-source relational database with a strong reputation for reliability, robustness, and performance. It supports replication to achieve high availability for a very long. But it lacks an important feature on its own, that is the concept of automated failover. Of course, there are a lot of open-source tools available to do it, but it won’t come under the core PostgreSQL packages. Additional knowledge and maintenance are required to handle it. 

In this blog post, we will look at the tool pg_auto_failover which is very simple to use, easy for configuration, and maintenance.

pg_auto_failover is an open-source tool(basically an extension) that can be used to perform automated failover in the PostgreSQL production environments. You can find its source code here.

Simple Architecture:

The architecture with pg_auto_failover requires N+1 nodes where N is the number of PostgreSQL servers and the additional one node is required, it is a monitor node that acts both as a witness and an orchestrator.

A simple architecture with 2 PostgreSQL servers and pg_auto_failover.

pg_auto_failover with a primary and a secondary server

Here I am using 3 VMs to show the demonstration, the details are as follows

HostnameIPPurpose
monitor192.168.33.22Monitor Server
db1192.168.33.23Primary PostgreSQL server
db2192.168.33.24Secondary PostgreSQL server

Installation of pg_auto_failover(On all the nodes):

Pg_auto_failover is available in the package managers for Linux distributions. The following steps for installation can be used in Fedora, CentOS, or Red Hat distributions. For other distributions, please refer here.

Perform the following steps in the monitor and the DB servers

  1. Adding the required packages to the system 
curl https://install.citusdata.com/community/rpm.sh | sudo bash
  1. Install pg_auto_failover
sudo yum install -y pg-auto-failover14_12
  1. Confirm the installation
/usr/pgsql-12/bin/pg_autoctl --version

pg_autoctl will be installed along with pg_auto_failover extension and other PostgreSQL client /server/ library packages. pg_autoctl is a binary with subcommands to initialize and manage the PostgreSQL service. 

Note: If you are using PostgreSQL 14 or others, your binaries will be placed in the respective directories. (/usr/pgsql-XX/bin/pg_autoctl )

Configuration of pg_auto_failover:

Once we are done with installation on all the DB and the monitor servers, we are ready to start the configuration.

Configure Monitor First(Only on monitor):

We need to first configure the monitor as it will periodically monitor the DB nodes and watches their health. For complete configuration, we will be using pg_autoctl commands along with the subcommands for respective operations.

  1. Create your data and tmp directories(if not exist) and change the ownership to postgres user.
mkdir -p /tmp/pg_autoctl 
chown -R postgres.postgres  /tmp/pg_autoctl
mkdir -p /var/lib/pgsql/12/data
chown -R postgres.postgres  /var/lib/pgsql/12/data

Note: make sure your system has user postgres. If not, create it before performing the above steps.

  1. Execute the below command to configure the monitor
sudo -u postgres /usr/pgsql-12/bin/pg_autoctl create monitor --auth trust --ssl-self-signed --pgdata /var/lib/pgsql/12/data/ --pgctl /usr/pgsql-12/bin/pg_ctl

Where

pgdata —> Absolute path of the data directory

pgctl   —> Absolute path to the pg_ctl binary

The above successful execution of the command will print the following message

11:54:24 4205 INFO  Your pg_auto_failover monitor instance is now ready on port 5432.
11:54:24 4205 INFO  Monitor has been successfully initialized.
11:54:24 4200 WARN  pg_autoctl service monitor-init exited with exit status 0
11:54:24 4204 INFO  Postgres controller service received signal SIGTERM, terminating
11:54:24 4204 INFO  Stopping pg_autoctl postgres service
11:54:24 4204 INFO  /usr/pgsql-12/bin/pg_ctl --pgdata /var/lib/pgsql/12/data --wait stop --mode fast
11:54:25 4200 INFO  Stop pg_autoctl

It also installs the pgautofailover extension and grants access to the new user autoctl_node in the local PostgreSQL database.

Note: The monitor server also runs the PostgreSQL service to store the state changes in the database servers(Primary or Secondary), only for its own purpose. 

  1. Execute the following command to start the monitor server by postgres user.
su - postgres
/usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data -p 5432 -h * &
  1. Check by login the PostgreSQL service (By postgres user)
 psql -d pg_auto_failover

Where 

-d —> pg_auto_failover is the database created by the monitor setup

pg_auto_failover=# \dx pgautofailover
             List of installed extensions
      Name      | Version | Schema |   Description
----------------+---------+--------+------------------
 pgautofailover | 1.4     | public | pg_auto_failover
(1 row)
  1. Run the pg_autoctl service as a monitor(Monitor server) under postgres user
/usr/pgsql-12/bin/pg_autoctl run &

It starts to listen on port 5432 in the monitor server for further configuration and monitoring

13:02:50 6588 INFO  Restarting service listener
13:02:50 6600 INFO   /usr/pgsql-12/bin/pg_autoctl do service listener --pgdata /var/lib/pgsql/12/data -v
13:02:50 6600 INFO  Managing the monitor at postgres://autoctl_node@10.0.2.15:5432/pg_auto_failover?sslmode=require
13:02:50 6600 INFO  Reloaded the new configuration from "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/12/data/pg_autoctl.cfg"
13:02:50 6600 INFO  Contents of "/var/lib/pgsql/12/data/postgresql-auto-failover.conf" have changed, overwriting
13:02:50 6607 INFO   /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data -p 5432 -h *
13:02:50 6597 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/12/data" on port 5432 with pid 6607
13:02:50 6600 INFO  The version of extension "pgautofailover" is "1.4" on the monitor
13:02:50 6600 INFO  Contacting the monitor to LISTEN to its events.

It is perfect now.

After post successful configuration of the monitor server, it is time to move to set up the DB server.

Configuring DB Servers(Only on DB servers):

  1. The following command will be executed on the DB1, will configure the primary database.
sudo -u postgres /usr/pgsql-12/bin/pg_autoctl create postgres --pgdata /var/lib/pgsql/12/data/ --auth trust --ssl-self-signed --username aakash --dbname mydbops --hostname 192.168.33.23 --pgctl /usr/pgsql-12/bin/pg_ctl --monitor 'postgres://autoctl_node@192.168.33.22:5432/pg_auto_failover?sslmode=require'

Where 

pgdata      —> Absolute path of the data directory

username —> User to be created in the database

dbname    —> Database to be created in the database 

hostname —> Hostname of the primary database (Current Server)

pg_ctl        —> Absolute path of the pg_ctl binary

monitor:

autoctl_node           —> Default user to connect to the monitor server. It will be created in the monitor server at the time of configuration as mentioned earlier.

192.168.33.22:5432 —> Hostname or IP of the monitor server along with the port.

pg_auto_failover —> Database created in the monitor server at the time of configuration.

13:04:09 4202 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/12/data" on port 5432 with pid 4224
13:04:09 4203 INFO  CREATE DATABASE mydbops;
13:04:09 4203 INFO  CREATE EXTENSION pg_stat_statements;
13:04:09 4203 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/12/data/server.crt -keyout /var/lib/pgsql/12/data/server.key -subj "/CN=192.168.33.23"
13:04:09 4203 INFO  Contents of "/var/lib/pgsql/12/data/postgresql-auto-failover.conf" have changed, overwriting
13:04:09 4203 INFO  Transition complete: current state is now "single"
13:04:09 4203 INFO  keeper has been successfully initialized.
13:04:09 4199 WARN  pg_autoctl service node-init exited with exit status 0
13:04:09 4202 INFO  Postgres controller service received signal SIGTERM, terminating
13:04:09 4202 INFO  Stopping pg_autoctl postgres service
13:04:09 4202 INFO  /usr/pgsql-12/bin/pg_ctl --pgdata /var/lib/pgsql/12/data --wait stop --mode fast
13:04:09 4199 INFO  Stop pg_autoctl
  1. Run the pg_autoctl service as keeper(DB server) under postgres user
 /usr/pgsql-12/bin/pg_autoctl run &
-bash-4.2$ 13:05:07 4274 INFO  Started pg_autoctl postgres service with pid 4277
13:05:07 4277 INFO   /usr/pgsql-12/bin/pg_autoctl do service postgres --pgdata /var/lib/pgsql/12/data -v
13:05:07 4274 INFO  Started pg_autoctl node-active service with pid 4278
13:05:07 4278 INFO   /usr/pgsql-12/bin/pg_autoctl do service node-active --pgdata /var/lib/pgsql/12/data -v
13:05:07 4278 INFO  Reloaded the new configuration from "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/12/data/pg_autoctl.cfg"
13:05:07 4278 INFO  pg_autoctl service is running, current state is "single"
13:05:07 4286 INFO   /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data -p 5432 -h *
13:05:07 4278 WARN  PostgreSQL was not running, restarted with pid 4286
13:05:07 4277 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/12/data" on port 5432 with pid 4286

Checking the current status

-bash-4.2$  /usr/pgsql-12/bin/pg_autoctl show state
  Name |  Node |          Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | 192.168.33.23:5432 | 0/16DBA48 |       yes |              single |              single

-bash-4.2$

It is showing the status as single as currently, only one server is available.

Let’s do the same on DB2 as well

sudo -u postgres /usr/pgsql-12/bin/pg_autoctl create postgres --pgdata /var/lib/pgsql/12/data/ --auth trust --ssl-self-signed --username aakash --dbname mydbops --hostname 192.168.33.24 --pgctl /usr/pgsql-12/bin/pg_ctl --monitor 'postgres://autoctl_node@192.168.33.22:5432/pg_auto_failover?sslmode=require'

Only changed the value for option hostname here. Used the IP of the server DB2.

Then, run the following

/usr/pgsql-12/bin/pg_autoctl run &

Now, check the current status. You could see the following

-bash-4.2$  /usr/pgsql-12/bin/pg_autoctl show state
  Name |  Node |          Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+--------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | 192.168.33.23:5432 | 0/3000110 |       yes |             primary |             primary
node_8 |     8 | 192.168.33.24:5432 | 0/3000110 |       yes |           secondary |           secondary

While configuring the DB2, the pg_auto_failover performs the standby setup as well by performing pg_basebackup for the initial sync.

You can use the status command on any nodes in the architecture.

That’s all about the configuration. Try to do some testing with various scenarios

  1. Shutting down the PostgreSQL service
  2. Shutdown the VM

This will do the failover on its own.

In the next blog, we will look at the more advanced features available and also with the multiple standby architecture which is more opt for the production systems.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s