3 Step Migration of MySQL data to Clickhouse for faster analytics.

Recently one of our client approach Mydbops with Query slowness on a MySQL environment . They deployed the new code for generate the huge reports for the year end analytics data . After the deployment the queries were extremely slow and they struggled lot , then they approached us for the solution. After the analysis, their OLAP database as expected it was IO bound with 100% disk IOPS utilised during the report generation. So, the queries were starving for the Disk IO slows the process .

Problem statement :

  • Reports are majorly focused on two larger log tables ( emp_Report_model , emp_details ) .
  • The report generator (procedure) is using the count(*) statement to stimulate the aggregated data on each call. It is required for their business purpose .
  • Count(*) is terribly slow in MySQL ( Using MySQL 5.7 ) as it needs to count all the rows in the table . ( MySQL 8.0 has Innodb parallel read threads that can make count(*) faster )
  • MySQL INDEX can’t help as we are aggregating the complete data ( 90% of data on each call ) the queries will be a Full Table Scan (FTS).

Then on further analysis it is found it is only a INSERT workload on those tables. There is no UPDATE’s or DELETE’s on those tables .

we proposed a solution to overcome the problem with the help of Clickhouse and migrating the data to Clickhouse.

What is Clickhouse ?

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.

Clickhouse Website

The major limitation on MySQL to Clickhouse replication is we can only apply the INSERT statements (append) from the MySQL. Clickhouse will not support for UPDATE’s and DELETE’s as a columnar database it makes sense.

Clickhouse Installation :

The installation is quite straight forward. The steps are available in Clickhouse official web site,

yum install rpm-build epel-release
curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash
yum install -y mysql-community-devel python34-devel python34-pip gcc python-devel libevent-devel gcc-c++ kernel-devel libxslt-devel libffi-devel openssl-devel python36 python36-devel python36-libs python36-tools

Clickhouse Server

yum install -y clickhouse-server clickhouse-client

Clickhouse MySQL replication Library

pip3 install clickhouse-mysql

Clickhouse startup :

[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server status
clickhouse-server service is stopped
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server start
Start clickhouse-server service: /etc/init.d/clickhouse-server: line 166: ulimit: open files: cannot modify limit: Operation not permitted
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
DONE
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server status
clickhouse-server service is running
[root@mydbopslabs192 ~]# clickhouse-client
ClickHouse client version 19.17.4.11.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.17.4 revision 54428.

mydbopslabs192 πŸ™‚ show databases;

SHOW DATABASES
β”Œβ”€name────┐
β”‚ default β”‚
β”‚ system β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2 rows in set. Elapsed: 0.003 sec.

It is all set with installation , Next step i need to migrate the data from MySQL to Clickhouse and configure the replication for those tables .

Data Migration from MySQL to Clickhouse :

Step 1 ( Dump the Clickhouse based schema structure from MySQL ) :

First thing i need to migrate the MySQL tables structure to Clickhouse . MySQL and Clickhouse having different data types . So, we cannot apply the same structure from MySQL to Clickhouse . The below document providing the neat comparison between MySQL and Clickhouse data types .

https://shinguz.ch/blog/clickhouse-data-types-compared-to-mysql-data-types/

Let us convert table structure from MySQL to Clickhouse using the clickhouse-mysql tool.

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --create-table-sql-template --with-create-database --src-tables=data_Analytics.emp_Report_model,data_Analytics.emp_details > data_Reports_Jan21st.sql
2020-01-21 09:03:40,150/1579597420.150730:INFO:Starting
2020-01-21 09:03:40,150/1579597420.150977:DEBUG:{'app': {'binlog_position_file': None,
'config_file': '/etc/clickhouse-mysql/clickhouse-mysql.conf',
'create_table_json_template': False,
2020-01-21 09:03:40,223/1579597420.223511:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics
2020-01-21 09:03:40,264/1579597420.264610:DEBUG:Connect to the database host=192.168.168.191 port=3306 user=clickhouse password=Click@321 db=data_Analytics

Dumping the table structure ,

[root@mydbopslabs192 ~]# less data_Reports_Jan12th.sql | grep CREATE
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_Report_model (
[root@mydbopslabs192 ~]# cat data_Reports_Jan12th.sql | head -n7
CREATE DATABASE IF NOT EXISTS data_Analytics;
CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (
WatchID Nullable(String),
JavaEnable Nullable(Int32),
Title Nullable(String),
GoodEvent Nullable(Int32),
EventTime Nullable(DateTime),

Step 2 ( Import the schema structure into Clickhouse ) :

[root@mydbopslabs192 ~]# clickhouse-client -mn < data_Reports_Jan12th.sql
[root@mydbopslabs192 ~]#
[root@mydbopslabs192 ~]# fg
clickhouse-client

mydbopslabs192 πŸ™‚ use data_Analytics
USE data_Analytics
Ok.
0 rows in set. Elapsed: 0.001 sec.
mydbopslabs192 πŸ™‚ show tables;
SHOW TABLES
β”Œβ”€name─────────────┐
β”‚ emp_Report_model β”‚
β”‚ emp_details β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2 rows in set. Elapsed: 0.003 sec.

Step 3 ( Migrating the data and keep replication sync ) :

Before configure the replication , the MySQL server should be configured with the below variables .

Mandatory MySQL settings :

server-id = <your id>
binlog_format = ROW
binlog_row_image = FULL

Now, we can configure the replication in two ways ,

  • Migrate the existing data , then configure the replication
  • Migrate the existing data and configure the replication in one step

i) Migrate the existing data , then configure the replication :

Commands to migrating the existing data

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --migrate-table --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --log-file=emp_Report_model.log
[root@mydbopslabs192 ~]# less emp_Report_model.log | grep -i migra
'migrate_table': True,
'table_migrator': {'clickhouse': {'connection_settings': {'host': '127.0.0.1',
2020-01-21 11:04:57,744/1579604697.744533:INFO:List for migration:
2020-01-21 11:04:57,744/1579604697.744947:INFO:Start migration data_Analytics.emp_Report_model
2020-01-21 11:04:57,891/1579604697.891935:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model
mydbopslabs192 πŸ™‚ select count(*) from data_Analytics.emp_Report_model;
β”Œβ”€count()─┐
β”‚ 8873898 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1 rows in set. Elapsed: 0.005 sec.

configuring the replication ,

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --log-file=emp_Report_model_Replication.log
2020-01-21 11:22:18,974/1579605738.974186:INFO:CSVWriter() self.path=/tmp/csvpool_1579605738.9738157_d643efe5-5ae0-47df-8504-40f61f2c139f.csv
2020-01-21 11:22:18,976/1579605738.976613:INFO:CHCSWriter() connection_settings={'port': 9000, 'host': '127.0.0.1', 'password': '', 'user': 'default'} dst_schema=data_Analytics dst_table=emp_Report_model
2020-01-21 11:22:18,976/1579605738.976936:INFO:starting clickhouse-client process
2020-01-21 11:22:19,160/1579605739.160906:INFO:['data_Analytics.emp_Report_model']
2020-01-21 11:22:19,166/1579605739.166096:INFO:['data_Analytics.emp_Report_model']
2020-01-21 11:22:19,170/1579605739.170744:INFO:['data_Analytics.emp_Report_model']
(END)

ii) Migrate the existing data and configure the replication in one step :

Here we need to define the flag –migrate-table with the replication command .

[root@mydbopslabs192 ~]# clickhouse-mysql --src-host=192.168.168.191 --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host=127.0.0.1 --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --migrate-table --log-file=emp_Report_model_replication_mig.log
[root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i mig
2020-01-21 11:27:53,263/1579606073.263505:INFO:List for migration:
2020-01-21 11:27:53,263/1579606073.263786:INFO:Start migration data_Analytics.emp_Report_model
2020-01-21 11:27:53,316/1579606073.316788:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model
[root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i process
2020-01-21 11:28:01,071/1579606081.071054:INFO:starting clickhouse-client process

Validating count post inserting some records in MySQL (Source)

mydbopslabs192 πŸ™‚ select count(*) from data_Analytics.emp_Report_model;
β”Œβ”€count()─┐
β”‚ 8873900 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

MySQL to Clickhouse replication is working as expected .

Performance comparison for OLAP workload ( MySQL vs Clickhouse ) :

Count(*) in MySQL :

mysql> select count(*) from emp_Report_model;

1 row in set (32.68 sec)

Count(*) in clickhouse :

mydbopslabs192 πŸ™‚ select count(*) from emp_Report_model;

1 rows in set. Elapsed: 0.007 sec.

Aggregated query in MySQL :

mysql> select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID;

292893 rows in set (1 min 2.61 sec)

Aggregated query in Clickhouse :

mydbopslabs192 πŸ™‚ select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID;

292893 rows in set. Elapsed: 1.710 sec. Processed 9.37 million rows, 906.15 MB (7.75 million rows/s., 749.15 MB/s.)

Yes, Clickhouse is performing very well with COUNT(*) and analytical queries .

Query ModelMySQLClickhouse
count(*)33 seconds0.1 seconds
OLAP Query63 seconds1.7 seconds
MySQL Vs Clickhouse

The above graph is just a pictorial representation of queries tested. Though Clickhouse excels in analytics workload it has it own limitations too. Now we have another happy customer at Mydbops who gets his analytics dashboard faster now.

Featured image creditsΒ Stephen DawsonΒ onΒ Unsplash

One thought on “3 Step Migration of MySQL data to Clickhouse for faster analytics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s