How to Switch Replica Master of a non-GTID Slave in Percona Cluster ?

Introduction –

Recently i worked on a production issue for one of our client under support .They have a architecture of a three node Galera cluster with one asynchronous slave .

  • Node1 – 172.10.2.11
  • Node2 – 172.10.2.12
  • Node3 – 172.10.2.13
  • Replica – 172.10.2.14

Architecture –

before switch

The slave(replica) was configured with node3 as replica master. Unfortunately the node 3 was crashed with an OOM killer ,also server has a low gcache size, so when i am trying to start the node 3 , it went to SST . Here the data size was around 2.6 TB , in general for completion of whole SST and joining the node back to cluster will take around  approximately 12 hours.

As i told earlier, the replication slave was under node3  and all reporting applications were pointed to async slave only .So, I can’t wait upto 12 hours as it will affect my entire client reporting environment.

To overcome this scenario, i had planned to switch my async slave under node 2 ( 172.10.2.12 ) . By this blog post, i am going to explain the steps how i was able to achieve this .

Step 1:

Stop the slave server for getting persistent log file and position in async slave.

mysql> STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.11 sec)

From the command SHOW SLAVE STATUS\G get the Relay_Log_File and Read_Master_Log_Pos.

mysql> show slave status\G
*************************** 1. row ***************************
         Slave_IO_State: Reconnecting after a failed master event read
                 Master_Host: 172.10.2.13
                 Master_User: repl
                 Master_Port: 3306
               Connect_Retry: 60
              Master_Log_File: mysql-bin.003099
        Read_Master_Log_Pos: 677232126
            Relay_Log_File: replica-relay-bin.009093
                Relay_Log_Pos: 677232323
        Relay_Master_Log_File: mysql-bin.003099

Here Relay_Log_File is replica-relay-bin.009093 and Read_Master_Log_Pos is 677232126

Step 2:

Decode the respective Relay_Log_File and get the Xid value using the position of Read_Master_Log_Pos .

mysqlbinlog --no-defaults --base64-output=decode-rows -vv /data/mysql/replica-relay-bin.009093 > /home/mydbops/replica-relay-bin.009093.txt

[mydbops@replica ~]$ less replica-relay-bin.009093.txt | grep "677232126"#181202  3:27:13 server id 12  end_log_pos 677232126 CRC32 0xc818fec0   Xid = 572228464

Extended view of replica-relay-bin.009093

# at 677232001
#181202  3:27:13 server id 12  end_log_pos 677231872 CRC32 0x19bffe41   Query thread_id=7239194 exec_time=1     error_code=0
SET TIMESTAMP=1543701433/*!*/;
SET @@session.foreign_key_checks=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
BEGIN
/*!*/;
# at 677232069
#181202  3:27:13 server id 12  end_log_pos 677231983 CRC32 0x8c9b2a51   Rows_query
# insert into mydbops.wsrep_paused(tag,counter) values ( NAME_CONST('node',2),@new_value)
# at 677232180
#181202  3:27:13 server id 12  end_log_pos 677232046 CRC32 0xa0e15b87   Table_map: `mydbops`.`wsrep_paused` mapped to number 297
# at 677232243
#181202  3:27:13 server id 12  end_log_pos 677232095 CRC32 0xe5c9c37e   Write_rows: table id 297 flags: STMT_END_F
### INSERT INTO `mydbops`.`wsrep_paused`
### SET
###   @1=1517733 /* INT meta=0 nullable=0 is_null=0 */
###   @2=2 /* TINYINT meta=0 nullable=0 is_null=0 */
###   @3=7555.01              /* FLOAT meta=4 nullable=1 is_null=0 */
###   @4=1543701433 /* TIMESTAMP(0) meta=0 nullable=1 is_null=0 */
# at 677232292
#181202  3:27:13 server id 12  end_log_pos 677232126 CRC32 0xc818fec0   Xid=572228464
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;

Step 3:

Using Xid number get end_log_pos from node 2 to which we are going configure replication.

[mydbops@db02 ~]$  sudo mysqlbinlog /data/mysql/mysql-bin.002279 --base64-output=DECODE-ROWS --verbose | grep "Xid = 572228464"#181202  3:27:13 server id 12  end_log_pos 815896364 CRC32 0x519db5da   Xid = 572228464

from Xid  ‘572228464’ we got the binlog position ‘815896364‘ in mysql-bin.002279

Step 4:

Pointing the replication to node 2 by running change master in slave server.

mysql> reset slave;
Query OK, 0 rows affected (0.11 sec)

mysql> reset slave all; 
Query OK, 0 rows affected (0.01 sec)

mysql> change master to master_host='172.20.4.12',master_port=3306,master_log_file='mysql-bin.002279',master_log_pos=815896364;
Query OK, 0 rows affected (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
           Slave_IO_State: Queueing master event to the relay log
                  Master_Host: 172.20.4.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002280
          Read_Master_Log_Pos: 100575837
               Relay_Log_File: replica-relay-bin.000002
                Relay_Log_Pos: 317
        Relay_Master_Log_File: mysql-bin.002279
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:

After switch over current master for replica is node 2 (172.20.4.12):

after_switch

Conclusion:

  • From the four simple steps i was able to get back the slave and make it  production ready, which reduces the impact of downtime of a switch with out GTID. 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s