Backup and Recovery is one of the most essential jobs for every database administrator. We all know the importance of the backups for the databases. Here I am not going to write its importance, rather about the different types of backup tools available for the PostgreSQL server.
In PostgreSQL, we can perform both logical and physical backups as same as MySQL
- Logical Backups (Text Backup)
- Physical Backups (Copying data files + Redo Log files)
In this blog, we are going to compare the tools in PostgreSQL which performs logical backups. In our previous post, we had made a similar kind of comparison of logical backup tools available for the MySQL database(Mysqldump vs Mysqlpump vs Mydumper).
Logical Backups are simple and the textual representation of the data in the databases. It contains insert queries and create statements to reproduce the data in the destination. In PostgreSQL, we are having two logical backup tools
From the manual,
pg_dump — extract a PostgreSQL database into a script file or other archive file
It says, pg_dump connects to the database and simply reads all the data and writes it into a script file or other archive file based on the provided options. The important thing to note here is it dumps only one database at a time.
The command to take backup using pg_dump is
[root@mydbopslabs31]# pg_dump -Upostgres sbtest1 > sbtest1_script.sql Password: [root@mydbopslabs31]#
I had taken the backup of the sbtest1 database which contains 10 tables. The backup file size is 1.9GB.
[root@mydbopslabs31]# ls -lrth | grep sbtest1 -rw-r--r--. 1 root root 1.9G May 18 13:59 sbtest1_script.sql [root@mydbopslabs31]#
If we want to take the backup of another database sbtest2, we need to run the same command again
[root@mydbopslabs31]# pg_dump -Upostgres sbtest2 > sbtest2_script.sql Password: [root@mydbopslabs31]#
File Format and Parallelism:
pg_dump has different output file formats
- Plain (p)
- Custom (c)
- Directory (d)
- Tar (t)
By default, It uses plain file format (Text file) and it can be restored using psql command-line utility. The dump taken by using the remaining file format(c,d,t) can be restored only with pg_restore utility and also only directory format support parallelism (This is the only output format where multiple processes(jobs) can write their data at the same time at different files for each table – one file per table inside the directory)
For directory format, we should mention the directory name(-f) and it should not exists
[root@mydbopslabs31]# pg_dump -Upostgres -Fd sbtest1 -j 2 -f backup pg_dump: error: could not create directory "backup": File exists
The output file will be compressed by default when we are using the file format custom and directory. For plain file format, we need to mention the compression value explicitly by using the option –compress, otherwise, the output file won’t be compressed. The value for this compress option lies from 0 to 9. 0 means no compression and 9 means high compression. The default value is 6. Also, the Tar file format does not support compression.
[root@localhost]# pg_dump -Upostgres -Ft sbtest1 --compress 6 > sbtest1_script.tar pg_dump: error: compression is not supported by tar archive format
We can also use –compress option along with custom and directory file format if we want to change the compression value other than six or to disable compression by setting it to zero.
Here I am disabling compression with directory file format
[root@mydbopslabs31]# pg_dump -Upostgres -Fd sbtest1 -j 2 --compress 0 -f backup Password: [root@mydbopslabs31]#
pg_dump provides us with consistent backup. It connects to the Postgresql server and starts a transaction with a repeatable-read isolation level. Then it simply reads all the data. This repeatable-read makes the consistent snapshot of the data which does not change during the transaction. In simple words, changes made to the data while the backup is running won’t be in the backup.
Also, the backup is the snapshot of data when we initiated the dump. Let’s say, I have started the dump at 7:00 am and it completes at 7:30 am, the backup file contains the data up to 7:00 am as we initiated the dump at 7:00 am
pg_dump works only on the database level, not on the instance level. When we are using pg_dump, we won’t get all create user statements(Users on the instance level).
The main purpose of taking backups is to restore it somewhere in the worst cases. The backup is taken by using pg_dump with the plain file format (or with default options), all are simple text files and readable. It can be easily restored by using psql command-line utility.
Here I am loading the backup of sbtest1 database(plain format pg_dump) to sbtest3 database in the server
[root@mydbopslabs31]# psql -Upostgres sbtest3 < sbtest1_script.sql Password for user postgres: [root@mydbopslabs31]#
If you want to know the progress of data loading, add the PV command
[root@mydbopslabs31]# pv sbtest1_script.sql | psql -Upostgres sbtest3 Password for user postgres: [root@mydbopslabs31]#
The other file formats such as directory, custom, tar cannot be restored by using psql, it can be restored only by using pg_restore.
Loading the custom format dump backup using pg_restore
[root@mydbopslabs31]# pg_restore -Upostgres -d sbtest3 sbtest1_script.fc Password [root@mydbopslabs31]#
We can use multiple threads for restoration while using pg_restore and also it allows us to restore just a part of a database as well (Ex: a particular table from a database backup). Both cases are not possible in plain format dump using psql.
[root@mydbopslabs31]# pg_restore -Upostgres -d sbtest3 -j 4 sbtest1_script.fc Password: [root@mydbopslabs31]#
From the manual
pg_dumpall — extract a PostgreSQL database cluster into a script file
It explains, pg_dumpall connects to all the databases of an instance one by one and sends the data to a script file or standard output. Internally it calls pg_dump for each database in the cluster. The output produced by pg_dumpall is a plain text format.
[root@localhost]# pg_dumpall -Upostgres -f alldatabases.sql Password: [root@localhost]# ls -lrth | grep alldatabases.sql -rw-r--r--. 1 root root 4.0G May 19 06:04 alldatabases.sql [root@localhost]#
File Format and Parallelism:
pg_dumpall supports the only plain text file format. And also it does not support parallelism to make use of multiple CPU cores.
By default, pg_dumpall does not have any compression options. Still, we can use output redirection to make a compression backup
[root@mydbopslabs31]# pg_dumpall -Upostgres | gzip > alldatabases.sql.gz Password: [root@mydbopslabs31]#
Pg_dumpall does not guarantee us with consistent backup as it connects to all databases one by one.
Let’s say, we have initiated the dump at 7:00 am and it completes at 8:00 am, here we cannot be sure what time the backup relates to.
Dumping of global objects such as roles and tablespaces are possible only in pg_dumpall
[root@mydbopslabs31]# pg_dumpall -Upostgres --global > global_data.sql Password: [root@localhost]#
We are having only psql for restoring the backup taken by using pg_dumpall
Here is the comparison difference between pg_dump and pg_dumpall
Comparison Chart between pg_dump and pg_dumpall
These are the major differences between these two logical backup utilities. We need to choose one according to our requirements. We will discuss more about backup tools that are available in Postgresql in the upcoming blogs.