Save Space on MySQL data with Column Compression

                                   Recently, One of our client reached our Remote DBA team with a requirement to reduce the size of the table as it is having many text columns with huge number of records. At preliminary check , I have validated the table size and its row format, as it was  in compressed  format already. 

Later I checked on other possibilities to compress the text columns further, At that time, then I came across per-column compression feature in Percona MySQL server (From 5.7.17-11) which features individual column compression and we were using Percona XtraDB cluster servers in that environment. We explored this feature and got major improvements in size reduction.

In this blog post, we will look on this feature and how to effectively use it. To test the same, I have installed Percona server for MySQL 5.7 in a machine with 4 cores of CPU and 8GB of RAM. I have used this tool mysql_random_data_load to load random data .Now let us walk through this feature.

Per-Column compression is a feature which helps us to store the data of columns in compressed format. The data will be compressed while writing on disk and decompressed while reading. It is using zlib library for compression.

It provides a better compression ratio with text data having large number of predefined words by using dictionaries. We need to select right column for compression which can provide good compression ratio else compression will be an additional overhead.

Here is a simple example.

mysql> select length('mydbops'),length(compress('mydbops'));
+------------------+-----------------------------+
| length('mydbops') | length(compress('mydbops'))|
+------------------+-----------------------------+
|                7 |           19                |
+------------------+-----------------------------+
1 row in set (0.00 sec) 
mysql> select length('w…...testededed'),length(compress('w…...testededed'));
+---------------------------+-------------------------------------+
| length('w…...testededed') | length(compress('w…...testededed')) |
+---------------------------+-------------------------------------+
|                78         |                     41              |
+---------------------------+-------------------------------------+
1 row in set (0.00 sec)

In the first example we observed that compressed data size is more than the original because of the compression header and small amount of data with less repetitive patterns. So, we should be more careful in selecting the column for compression.

I have created a test database named “compression” and table with below structure .

mysql> show create table compress_test\G
*************************** 1. row ***************************
       Table: compress_test
Create Table: CREATE TABLE `compress_test` (
  `id` varchar(100) NOT NULL,
  `data` text,
  `health_record_type_id` int(11) DEFAULT NULL,
  `created_time` datetime DEFAULT NULL,
  `created_user` varchar(100) DEFAULT NULL,
  `sync_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

I have created three tables with 2M records and the size of the tables are around 9GB

[root@labs test]# ls -lrth compress_test*.ibd
-rw-r-----. 1 mysql mysql 8.9G Sep 24 09:23 compress_test3.ibd
-rw-r-----. 1 mysql mysql 8.9G Sep 24 09:40 compress_test2.ibd
-rw-r-----. 1 mysql mysql 8.9G Sep 24 10:02 compress_test1.ibd

Ok, now we will start compressing the tables by using different methods and to analyse the best one.

        1) row_format=compressed

We will change the row format to compressed in table compress_test1.

mysql> alter table compress_test1 row_format=compressed;
Query OK, 0 rows affected (7 min 21.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now the size of the table is reduced by 60%.

[root@labs test]# ls -lrth compress_test1.ibd
-rw-r-----. 1 mysql mysql 3.6G Sep 25 01:32 compress_test1.ibd

            2 )  Now we can implement column compression in test_records2 table. Column compression supports Blob (tiny,medium,long), text(tiny,medium,long), varchar,varbinary, json data types.

mysql> alter table compress_test2 modify data text column_format compressed;
Query OK, 2000000 rows affected (5 min 34.41 sec)
Records: 2000000  Duplicates: 0  Warnings: 0
[root@labs test]# ls -lrth compress_test2.ibd
-rw-r-----. 1 mysql mysql 1.4G Sep 25 01:48 compress_test2.ibd

Great, now the size of the table is about 1.4 GB and it is reduced by around 85% from its original size. 

Is it possible to reduce it further? 

Yes, it is. We can achieve further reduction in size by using column dictionary. 

Column dictionary is a method that contains predefined set of expected patterns, will help us to achieve better compression ratio.

           3) I have stored JSON values in that column. As it is a JSON, it is in the format of key/value pair. Key is always constant while value vary. So key is repetitive. I’m going to create dictionary based on the key. If you are storing some other data in a column, use most repetitive things to create the column dictionary. Otherwise dictionary won’t increase compression ratio.

mysql>SET @data = '""name","freq","smkCtrlVis","alcCtrlVis","trtDn","conDn","remarks","reviews","treatDate","reviewDate""';

Query OK, 0 rows affected (0.17 sec)
mysql> CREATE COMPRESSION_DICTIONARY test_dictionary (@data);
Query OK, 0 rows affected (0.02 sec)

Implementing column compression in test_records3 with created column dictionary test_dictionary.

mysql> alter table compress_test3 modify data text column_format compressed WITH COMPRESSION_DICTIONARY test_dictionary;
Query OK, 2000000 rows affected (6 min 42.12 sec)
Records: 2000000  Duplicates: 0  Warnings: 0
[root@labs test]# ls -lrth compress_test3.ibd
-rw-r-----. 1 mysql mysql 1.2G Sep 25 02:27 compress_test3.ibd

Better, It is reduced by 200 MB further. We can achieve better compression ratio using column dictionary if we have a lot of repeated patterns. Otherwise, we can avoid it.

MethodTable Name Size(GB)
row_format = compressedcompress_test13.6
column_format compressedcompress_test21.4
WITH COMPRESSION DICTIONARY compress_test21.2

We can control the compression by using the system variables like innodb_compressed_columns_threshold and innodb_compressed_columns_zip_level. 

mysql> show global variables like 'innodb_compressed_%';
+-------------------------------------+-------+
| Variable_name                       | Value |
+-------------------------------------+-------+
| innodb_compressed_columns_threshold | 96    |
| innodb_compressed_columns_zip_level | 6     |
+-------------------------------------+-------+
2 rows in set (0.13 sec)

Innodb_compressed_columns_zip_level(0-9):

This variable is used to specify the compression level used for compressed columns. Specifying 0 will use no compression, 1 the fastest and 9 the highest compression. Default value is 6. Specifying higher value will give better compression but the tradeoff is lower speed and increase in cpu utilization. The default value 6 is good for both compression and speed.

 Innodb_compressed_columns_threshold(Bytes):

          The value inserted will be compressed if the length exceeds this Innodb_compressed_columns_threshold value. The default value is 96. If it is less than that, the data will be stored in uncompressed format. This variable will be helpful to avoid unwanted compression of very small data.(As explained earlier, compression will be an overhead if the data is smaller)

Limitations:

  1. We cannot create index on compressed columns
mysql> alter table compress_test add index idx_data(data);
ERROR 11001 (HY000): Compressed column 'data' is not allowed in the key list

       2) discard/import tablespace will not support for tables with compressed columns. To do that, we need to decompress prior to it.

       3) Also, we need to mention enable-compressed-columns while taking backup using mysqldump, otherwise it will skip the compression statement. For dictionaries, enable-compressed-columns-with-dictionaries will be used.

Example:

  1. Dump without compression options
mysqldump test compress_test --no-data

DROP TABLE IF EXISTS `compress_test`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `compress_test` (
  `id` varchar(100) NOT NULL,
  `data` text,
  `health_record_type_id` int(11) DEFAULT NULL,
  `created_time` datetime DEFAULT NULL,
  `created_user` varchar(100) DEFAULT NULL,
  `sync_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED;

      2) Dump with compression option

mysqldump test compress_test --enable-compressed-columns --no-data

DROP TABLE IF EXISTS `compress_test`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `compress_test` (
  `id` varchar(100) NOT NULL,
  `data` text /*!50633 COLUMN_FORMAT COMPRESSED */,
  `health_record_type_id` int(11) DEFAULT NULL,
  `created_time` datetime DEFAULT NULL,
  `created_user` varchar(100) DEFAULT NULL,
  `sync_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED;

       3) Along with respective dictionaries

mysqldump test compress_test --enable-compressed-columns --no-data --enable-compressed-columns-with-dictionaries

/*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS `test_dictionary` */;
/*!50633 CREATE COMPRESSION_DICTIONARY `test_dictionary` ('\", \"false\", \"name\", \"Artesunate\", \"true\", \"remarks, \"wyqwteqgqwwqtest\" ,\"30\", \"pack\", \"heartRate\", \"treatDate\", \"wyqwteqgqwwqtest etest teste testedededwyqwteqgqwwqtest etest teste testededed\"\"') */;
DROP TABLE IF EXISTS `compress_test`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `test_records` (
  `id` varchar(100) NOT NULL,
  `data` text /*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY `test_dictionary` */,
  `health_record_type_id` int(11) DEFAULT NULL,
  `created_time` datetime DEFAULT NULL,
  `created_user` varchar(100) DEFAULT NULL,
  `sync_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED;

Rollback:

Decompression of a column can be done by changing the column format other than compressed (fixed/dynamic,default).

mysql> alter table compress_test modify column data text column_format default;

Also, whenever we are compressing/decompressing column, we are forced to use only copy algorithm in Alter command. Inplace won’t work as compression needs a complete rebuild.

mysql> alter table test_records modify column data text column_format compressed,algorithm=inplace,lock=none;
ERROR 1846 (0A000): ALGORITHM=INPLACE is not supported. Reason: Cannot change column type INPLACE. Try ALGORITHM=COPY.

Hope this blog helps you to know about column compression feature in MySQL. Happy compressing !!

Featured image by Belinda Fewings on Unsplash

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s