PGPool-II is an open-source proxy software that can be used for the PostgreSQL servers(ProxySQL for MySQL) environment. It lies between the application server/PostgreSQL client and the PostgreSQL database servers and can provide much-needed features like Load Balancing, Query Cache, Connection Pooling, and Watchdog. Each feature requires a detailed explanation with supporting examples. So, in this part let’s discuss the basic setup of PGPool-II with PostgreSQL streaming replication and in the upcoming parts, we will explore more about its features.
History of PGPool-II:
PGPool-II started its life as PGPool in 2003 just as a connection pooling software. Later in 2006, it was released as PGPool-II with many features and with the elimination of many limitations in PGPool. Still, PGPool-II is one of the most popular proxy available for the PostgreSQL servers.
The necessity of PGPool-II:
Apart from the advanced features, PGPool needs for few cases like automatic failover of connections to the backend healthy DB server and maximum connection errors(Too many connections) in the applications. Also, connection pooling reduces the overhead of creating new connections on the database.
Now, let’s start with the setup of PGPool-II for the two PostgreSQL servers(Primary and Secondary) configured with streaming replication. Considering as having the primary and the secondary already which setup has not been explained here.
|Server B||192.168.33.12||Primary PG|
|Server C||192.168.33.13||Secondary PG|
Below is the architecture of this setup which is planned to achieve.
Installation of PGPool-II :
The following steps have been performed on server A.
- Installing the required repo for CentOS 7
yum install -y http://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-release-4.1-2.noarch.rpm
2. Installing the PGPool-II
yum install -y pgpool-II-pg12-*
The PostgreSQL version used in this setup is 12, so the pg12* of PGPool-II is used for libraries and extension directories of PostgreSQL 12. Replace step 2 pg12* with the appropriate version of PostgreSQL.
In case of any difficulties, while installing using the package manager or unable to install the version required, it is good to go with source installation.
- Downloading the source code of the version required
2. Extract the downloaded file
tar xf pgpool-II-4.2.1.tar.gz
It will create the directory pgpool-II-4.2.1
3. Install the below packages that need to compile the code
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install gcc make libpq -y
4. Post successful installation, configure, compile and install it using the below commands
./configure make make install
Configuration of PGPool-II:
The configuration file of PGPool-II is /etc/pgpool-II/pgpool.conf
First, creating two users on the primary server
- Monitor user – To perform a health check for the backend databases (only login is required)
- App user – For the application use case (Provide the required privileges based on the application purpose, read or write)
CREATE USER monitor WITH ENCRYPTED PASSWORD 'Monitor@321'; CREATE USER appuser WITH ENCRYPTED PASSWORD 'Appuser@321';
And creating a database for the monitor user
create database monitor;
Make sure these users are replicated to the secondary server and also add an entry on the primary and secondary servers pg_hba.conf file to access the database from pgpool server and reload it.
host all all 192.168.33.11/32 md5
And reload it to take effect.
Preparing the configuration file for the pgpool-II on server A
## Connection Details listen_addresses='*' port=9999 socket_dir = '/var/run/postgresql' ## Backend Server Details # Primary backend_hostname0 = '192.168.33.12' backend_port0 = 5432 backend_weight0 = 1 # Secondary backend_hostname1 = '192.168.33.13' backend_port1 = 5432 backend_weight1 = 1 ## Load Balancing load_balance_mode = 'ON' ## Replication Responsibilty master_slave_mode = 'ON' master_slave_sub_mode = 'stream' ## Streaming checks sr_check_period = 10 sr_check_user = 'monitor' sr_check_password = 'Monitor@321' sr_check_database = 'monitor' delay_threshold = 10240 ## Client Authentication allow_clear_text_frontend_auth = ‘ON’
Here explaining the purpose of each variable on the config, it can be modified based on the requirements.
Listen_addresses – From which IP addresses pgpool accepts incoming requests. * means it accepts all requests.
Port – On which port, pgpool wants to listen for client requests. The port here using needs to be used on the application or the client to connect to the database.
Socket_dir – Directory for socket accepting connections
backend_hostname*, backend_port*, backend_weight* – hostname specifies the PostgreSQL backend server, port specifies the port number of the backend server, weight defines the load balance ratio of the backend server
Here we are having two servers, so the backend variable has id 0 and 1.
In case, if we are adding one more server as the secondary in the future, it needs to be added as backend_hostname2, backend_port2, backend_weight2.
To enable load balancing on incoming select queries to both the backend servers.
To determine the replication system used for data replication. The value used is stream as it is a streaming replication.
Streaming checks variables are used to monitor the status of streaming replication using the created monitor user. The variable delay_threshold defines the maximum lag (in bytes) in the secondary server. If the lag exceeds, pgpool won’t serve any select queries to the secondary to avoid reading stale data on the application.
allow_clear_text_frontend_auth – It allows the pgpool to get the password in plain text from the frontend and use it for backend authentication.
Once all the above settings are done, restarting the pgpool takes all the settings into effect.
[root@server1 ~]# service pgpool restart Redirecting to /bin/systemctl restart pgpool.service [root@server1 ~]#
Checking the connection using the pgpool from pgpool server
psql -h<pgpool_ip> -U<appuser> -p<pgpool_port> <db_name> -c "<query>"
[root@server1 data]# psql -h 192.168.33.11 -Uappuser -p 9999 postgres -c "select datname from pg_database" Password for user appuser: datname ----------- postgres test1 template1 template0 monitor aakash mydbops (7 rows)
Now doing some tests and checking the status of the node and the queries distribution on the pgpool.
[root@server1 ~]# watch -n1 "psql -h 192.168.33.11 -Uappuser -p 9999 postgres -c 'select datname from pg_database'"
[root@aakash2 ~]# psql -h 192.168.33.11 -Uappuser -p 9999 postgres -c "show pool_nodes" node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+------------------- -- 0 | 192.168.33.12 | 5432 | up | 0.500000 | primary | 21 | false | 0 | | | 2021-01-14 16:00:4 9 1 | 192.168.33.13 | 5432 | up | 0.500000 | standby | 7 | true | 0 | | | 2021-01-14 16:00:4 9 (2 rows)
From the above output, the node status is online and queries are distributed.
That’s it on the basic pgpool setup with streaming replication. It is easy to add more replicas under it by adding backend_* variables.
Current architecture will be a single point of failure(SPOF) in case server1 (pgpool) goes down. So, in the next blog, we will look on how to achieve high availability on pgpool servers (PGPool Cluster) using watchdog and Virtual IP.