Chose right SST method MariaDB Cluster 10.2

                In this blog post, I am going to explain an interesting issue which I faced in one of our client project . Few weeks back , I got an requirement from our support client to construct a new MariaDB Galera Cluster ( 10.2.21 ) and an async slave with GTID. The Complete requirement is as below.

  • Need to construct the Galera cluster with 37 GB of data ( Using a given MySQL dump file )
  • Have to configure an asynchronous slave under the 3 node Galera cluster
  • The asynchronous slave was proposed with GTID to ease master node switchover.

So, as per the requirement the architecture will be, 

 Architecture  –

blogg

How to achieve this ?

A high level steps

    • Install MariaDB 10.2.21 server on all four nodes ( 3 Galera node with wsrep_on = ON & 1 async slave with wsrep_on = OFF ).
    • Bring up ( join to primary node ) other two Galera nodes through SST
    • Configure the asynchronous slave ( node4 ) from one of the Galera node backup ( Here I planned to use the PXB streaming / but unfortunately it was not happened )

SST issues with PXB –

           As per the plan, I had installed the MariaDB Server ( version – 10.2.21-MariaDB-log ) in all four nodes . After the installation, I bootstrapped the first node as primary . Then successfully imported the MySQL dump file ( 37 GB ) into primary node.

          Then kick started the MariaDB on node2 to stream the backup from bootstrapped node ( node1 ) through SST .  I used xtrabackup-v2 as the SST method and the sample config on node2 is below

MySQL sample config file  -

[mysqld]

server-id     = 102
log-bin
log-slave-updates
binlog-format = row
bind-address  = 0.0.0.0
log-error     = /var/log/mysql/mysql.log
datadir.      = /var/lib/mysql

#GTID
gtid_domain_id = 10

#galera
wsrep_on                        = ON
wsrep_provider                  = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address           = gcomm://192.168.33.13,192.168.33.14
wsrep_node_address              = 192.168.33.14
wsrep_sst_method                = xtrabackup-v2
wsrep_cluster_name              = labs_cluster
wsrep_node_name                 = n1
wsrep_provider_options          = "gcache.size=4G;gcs.fc_limit=1000;gcs.fc_master_slave=YES;gcache.recover=yes"
wsrep_sst_auth                  = "sst:Secure@Pas3"
wsrep_slave_threads             = 12

             I have cleared the MySQL data directory on second node ( node2 ) and started the MariaDB server for SST  .

[root@s2 mydbops]# service mysql stop
Stopping mysql (via systemctl): [ OK ]
[root@s2 mydbops]# less /etc/my.cnf | grep -i datadir
datadir = /var/lib/mysql
[root@s2 mydbops]# rm -rf /var/lib/mysql/*
[root@s2 mydbops]# service mysql start

            But, the things were not going well as expected and node2 was keep failing on SST . Below is the error I found from the error logs .

2019-02-02  9:59:16 139728621905664 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
2019-02-02  9:59:16 139728621905664 [Note] WSREP: gcomm: terminating thread
2019-02-02  9:59:16 139728621905664 [Note] WSREP: gcomm: joining thread
2019-02-02  9:59:16 139728621905664 [Note] WSREP: gcomm: closing backend
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20190202 09:59:16.743)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20190202 09:59:16.748)
2019-02-02  9:59:16 139728613512960 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.33.13' --datadir '/var/lib/mysql/'   --parent '5644' --binlog 'mydbopslabs13-bin' '': 2 (No such file or directory)
2019-02-02  9:59:16 139728613512960 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2019-02-02  9:59:16 139728879081600 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2019-02-02  9:59:16 139728879081600 [ERROR] Aborting
2019-02-02  9:59:29 139716069972096 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=mydbopslabs13' or '--log-bin=mydbopslabs13-bin' to avoid this problem.
2019-02-02  9:59:29 139716069972096 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-02-02  9:59:29 139716069972096 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2019-02-02  9:59:29 139716069972096 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Coders

I got confused about this behaviour , because my data directory was already empty and there is no port issue ( all the Galera ports were enabled ) and the selinux also disabled  .

I kept on investigating this issue  “WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script” 

GTID implementation has major difference between MySQL & MariaDB ( Galera ) . i.e. wsrep_gtid_domain_id variable is available only on MariaDB Galera clusters not in Percona XtraDB Cluster .

Percona Xtradb Cluster -

mysql> select @@version_comment, @@version\G
*************************** 1. row ***************************
@@version_comment: Percona XtraDB Cluster (GPL), Release rel18, Revision 4a4da7e, WSREP version 29.24, wsrep_29.24
        @@version: 5.7.20-18-57-log
1 row in set (0.00 sec)

mysql> show global variables like 'wsrep_gtid_domain_id'\G
Empty set (0.01 sec)


MariaDB Galera -

root@localhost:(none)>select @@version_comment, @@version\G
*************************** 1. row ***************************
@@version_comment: MariaDB Server
        @@version: 10.2.23-MariaDB-log
1 row in set (0.00 sec)

root@localhost:(none)>show global variables like 'wsrep_gtid_domain_id'\G
*************************** 1. row ***************************
Variable_name: wsrep_gtid_domain_id
        Value: 0
1 row in set (0.00 sec)

As per the below error my SST script is getting failed without reading the uuid:seqno and wsrep_gtid_domain_id .

2019-02-02  9:59:16 139728613512960 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script. 2019-02-02  9:59:16 139728879081600 [ERROR] WSREP: SST failed: 2 (No such file or directory) 2019-02-02  9:59:16 139728879081600 [ERROR] Aborting

I started to look into the SST scripts , is the SST scripts are really capable for get those details ?

with Percona xtrabackup-v2 SST script -

[root@s2 mydbops]# which wsrep_sst_xtrabackup-v2
/bin/wsrep_sst_xtrabackup-v2

[root@s2 mydbops]# less /bin/wsrep_sst_xtrabackup-v2 | grep -i wsrep_gtid
[root@s2 mydbops]#


with mariabackup SST script -

[root@s2 mydbops]# which wsrep_sst_mariabackup
/usr/bin/wsrep_sst_mariabackup

[root@s2 mydbops]#  less /usr/bin/wsrep_sst_mariabackup | grep -i wsrep_gtid
# Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
# Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
cat "${MAGIC_FILE}" # Output : UUID:seqno wsrep_gtid_domain_id
[root@s2 mydbops]#

Yes, only Mariabackup SST script has that .  I finally identified that I had the problem with my SST method . Below I am sharing more information from Mariabackup SST script .

if [[ ! -r ${MAGIC_FILE} ]];then 
wsrep_log_error "SST magic file ${MAGIC_FILE} not found/readable"
exit 2
fi
wsrep_log_info "Galera co-ords from recovery: $(cat ${MAGIC_FILE})"
cat "${MAGIC_FILE}" # Output : UUID:seqno wsrep_gtid_domain_id
wsrep_log_info "Total time on joiner: $totime seconds"

Finally I was able to fix the issue by changing the SST method from Percona xtrabackup-v2 to Mariabackup .

MySQL config file - 

wsrep_sst_method                = mariabackup 

SST logs with Mariabackup - 

WSREP_SST: [INFO] Proceeding with SST (20190202 10:25:48.760) WSREP_SST: [INFO] Waiting for SST streaming to complete! (20190202 10:25:48.804) 
2019-02-02 10:26:02 140653493200640 [Note] WSREP: 1.0 (n1): State transfer to 0.0 (n2) complete. 
2019-02-02 10:26:03 140653493200640 [Note] WSREP: Member 1.0 (n1) synced with group. WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20190202 10:26:03.163) &>${DATA}/innobackup.move.log (20190202 10:26:03.922) WSREP_SST: [INFO] Removing the sst_in_progress file (20190202 10:26:04.125) 
2019-02-02 10:26:04 140653751167104 [Note] WSREP: SST complete, seqno: 3

Finally the SST went smoothly and my cluster is in sync and successfully constructed the async slave as well using Mariabackup .

Additionally ,

Percona Xtrabackup can be used with MariaDB 10.2.x version but it is not compatible with 10.3 and higher versions . The MariaDB Official Page explains the limitation of PXC with MariaDB.

Beware of the SST methods in MariaDB and make a right choice of your SST. PXB cant be good choice for SST method with MariaDB as the features deviates in the way it is designed between MySQL and MariaDB . Mariabackup will be the right SST method with MariaDB cluster from 10.2 and higher releases.

Photo by Jens Lelie on Unsplash
Advertisements

One thought on “Chose right SST method MariaDB Cluster 10.2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s