MariaDB/XtraDB Cluster Default Disk Files and its Significance

Galera is the core tech behind the Percona XtraDB Cluster and MariaDB Cluster. When using the Galera cluster we might have come across multiple files in the data directory beyond the Native MySQL or InnoDB Files. It is always good to understand these files to know what is happening internally. Each file present in the MySQL data directory of the Galera cluster ( Percona XtraDB Cluster / MariaDB Cluster ) gives some implications that can help us in better understanding and troubleshooting the Galera cluster. This post explains the default files present in the data directory related to the Galera cluster and the significance of each file.

Spreading across the data directory there exist a few additional files for Galera cluster operations. They are

  • Galera cache
  • Grastate.dat
  • Gvwstate.dat
  • GRA_*.log

Each of these files has its purpose in the cluster. Throughout this article let us try to explain the default files present in the data directory of Galera Clusters and its significance.

Galera Cache:

This file is used to store the writeset. File size can be controlled with the variable gcache.size under the setting wsrep_provider_options . If the value of gcache.size is bigger it can accommodate more writesets and the probability of the re-joining node via IST increases avoids SST.

Setting the proper gcache size is based on the size of writestream received by the nodes.

Calculating gcache size for your workload :

To estimate the gcache size for your workload we need to consider two status variables.

  • wsrep_replicated_bytes : This variable represents the total size of write-sets replicated. ( primary )
  • wsrep_received_bytes : This variable represents the total size of write-sets received from other nodes. ( Secondary )

Combining these two we get the total size of writesets over any period.

Calculating writeset over a minute timespan.

mysql> show global status like 'wsrep_received_bytes'; 
+----------------------+----------+
| Variable_name        | Value	  |
+----------------------+----------+
| wsrep_received_bytes | 62415121 |
+----------------------+----------+

mysql> show global status like 'wsrep_replicated_bytes'; 
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 0     |
+------------------------+-------+

mysql> select sleep(60); 
+-----------+
| sleep(60) |
+-----------+
| 0         |
+-----------+

mysql> show global status like 'wsrep_received_bytes'; 
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received_bytes | 66619529 |
+----------------------+----------+

mysql> show global status like 'wsrep_replicated_bytes';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 0     |
+------------------------+-------+

Writeset received in bytes or MB per minute :

The below is the formula used to calculate the effective Gcache size

(wsrep_received_bytes + wsrep_replicated_bytes ) - (wsrep_received_bytes + wsrep_replicated_bytes )

(66619529 + 0) - (62415121 + 0) = 4204408 bytes ( 4 MB )

Based on the above calculation during a production work load the writes to Gcache per min is 4MB. So roughly the Gcache size needed for a 1 hour window is 240 MB ( roughly ). This helps in the unexpected outages and smaller maintenance window without further tuning of Gcache.

Grastate.dat :

This file represents Galera current state information.It is updated at the time of shutdown or crash.Below is the sample contents of grastate.dat.

# GALERA saved state
version: 2.1
uuid:    eaad8819-e6a3-11e6-84d2-3e87f6331c69
seqno:   94312
safe_to_bootstrap: 0

version : This implies a grastate version

uuid : This represents the common unique identifier across the nodes view

seqno : Sequence number is a 64-bit signed integer that denotes the sequence or position of the writeset in the cluster node. Given write set will have the same seqno on all the nodes of the cluster. seqno is 0 till no writesets have been generated. If the seqno has a value greater than 0 indicates that the earlier shutdown on the cluster node was clean. If the value is -1 emphasis the server shut down was not clean or cluster is active. The seqno also helps us in identifying the node which has the latest writeset and helps in bootstrap in case of the complete crash.

safe_to_bootstrap : In case of a cluster crash, the node refuses to bootstrap as it suspects the other nodes might have the lastest writesets. In those cases, the safe_to_bootstrap has to be enabled (safe_to_bootstrap =1 ) on the node with the latest writeset after validating the logs of all cluster nodes.

Read more about Troubleshooting a rejected node.

Gvwstate.dat :


This file contains information about cluster node states from the node’s view. We can find the UUID of the current node and other members of the cluster.


The node builds and updates this file when the cluster is built or when the Primary Component is modified. This ensures that the node retains the latest Primary Component state that it was in. If the node loses connectivity, it has the file to reference.If the node shuts down gracefully, it removes the file. Only in case of an abrupt shutdown, this file is retained so the latest state is retained.

Below is a sample contents of the gvwstate.dat file:

my_uuid: ff5f71e1-7426-11eb-90d4-8244c5aaca11
#vwbeg
view_id: 3 181668ff-7425-11eb-a2b2-5b7e558ce0dc 52
bootstrap: 0
member: 181668ff-7425-11eb-a2b2-5b7e558ce0dc 0
member: 1da6674a-7426-11eb-8e6c-6733fb34ecad 0
member: eb2aebbc-7423-11eb-ba18-4b7a69fad471 0
member: ff5f71e1-7426-11eb-90d4-8244c5aaca11 0
#vwend

Fetching the UUID and Segment from the pxc_cluster_view table to cross validate it.

mydbops@localhost>select UUID,SEGMENT from performance_schema.pxc_cluster_view;
+--------------------------------------+---------+
| UUID                                 | SEGMENT |
+--------------------------------------+---------+
| 181668ff-7425-11eb-a2b2-5b7e558ce0dc |       0 |
| 1da6674a-7426-11eb-8e6c-6733fb34ecad |       0 |
| eb2aebbc-7423-11eb-ba18-4b7a69fad471 |       0 |
| ff5f71e1-7426-11eb-90d4-8244c5aaca11 |       0 |
+--------------------------------------+---------+

The gvwstate.dat file is composed of two components

  • Individual information of node
  • View Information of individual nodes

Individual Node Information provides the node’s UUID, in the my_uuid field.

View Information contained between #vwbeg and #vwend tags provides information on the node’s view of the Primary Component.

#vwbeg
.
.
.
#vwend

The view_id forms an identifier for the view from three parts:

  • view_type, which always gives a value of 3 to indicate the primary view
  • The view_uuid and view_seq together form a unique value for the identifier.
  • The member variable value contains the individual UUID’s of nodes connecting to the Primary Component.

GRA_*.log :

The database server creates a special binary log file that contains the transactions (failed) when a node fails to apply an event. Usually, these files are named GRA_*.log. The GRA_*.log also includes false errors like dropping of any database or table which already does not exist, duplicate entry for a primary key record, etc. Moreover, there is no need to worry if this kind of file exists in a data directory. 

To analyze the GRA_*.log files you can refer to the post by Frederic Descamps.

https://www.percona.com/blog/2012/12/19/percona-xtradb-cluster-pxc-what-about-gra_-log-files/

I hope this blog helps you to understand and clarify the basic significance of the default files present in the Galera Cluster ( Percona XtraDB Cluster / MariaDB Cluster ).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s