MySQL load data infile made faster .

Loading any large file into MySQL server using the LOAD DATA INFILE is a time consuming process , because it is single threaded and it is a single transaction too. But with modern hardwares system resource is not a bottle neck. At Mydbops we focus on improving the efficiency of process as we value performance more. MySQL introduced the parallel load data operations in its latest minor release MySQL 8.0.17 . I had the curiosity to test this feature and wanted to know, how it can improve the data loading comparing to the existing single threaded method  . Through this blog I am going to  compare the both methods .

Remember you can use the parallel data loading utility only via MySQL Shell .

Internal Work Flow :

This section describes the working flow of the parallel data loading in MySQL 8.0.17 .

  • The importTable utility will analyse the input data file
  • The importTable utility will divide the large file into chunks
  • The importTable utility will update the chunks to destination MySQL server with parallel threads

Lab Environment :

Below is the server configuration , which I was used to test the data loading operation .

  • 4 core CPU
  • 8 GB RAM
  • SSD Disk
  • Centos 7

I have a CSV file to be loaded , the size of the file is around 8 GB .

sakthisrii:mysql-files sakthivel$ ls -lrth
-rw-r--r-- 1 root _mysql 8.06G Sep 12 18:16 load_data.csv

Let’s start the test with the existing method ( single threaded LOAD DATA INFILE ) .

Loading via load data infile:

MySQL localhost:33060+ ssl osm SQL > load data infile '/usr/local/mysql/mysql-files/load_data.csv' into table single.single_load fields terminated by ',' optionally enclosed by '"' lines terminated by '\n' (user_id,visible,name,size,latitude,longitude,timestamp,public,description,inserted);

Query OK, 14244516 rows affected, 0 warnings (39 min 35.5036 sec)
Records: 14244516 Deleted: 0 Skipped: 0 Warnings: 0

Well it took 39 minutes and 36 seconds .  No, I am going to load same data on a different table , with new Muti threaded in MySQL shell utility .

With multi threaded in MySQL Shell.

MySQL localhost osm JS > util.importTable("/usr/local/mysql/mysql-files/load_data.csv", {schema: "parallel", table: "parallel_load", columns: ["user_id","visible","name","size","latitude","longitude","timestamp","public","description","inserted"], dialect: "csv-unix", skipRows: 0, showProgress: true, fieldsOptionallyEnclosed: true, fieldsTerminatedBy: ",", linesTerminatedBy: "\n",fieldsEnclosedBy: '"',threads: 8, bytesPerChunk: "1G", maxRate: "2G"})

Importing from file '/usr/local/mysql/mysql-files/load_data.csv' to table `parallel`.`parallel_load` in MySQL Server at /tmp%2Fmysql.sock using 8 threads
[Worker01] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker02] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker03] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker04] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker05] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker06] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker07] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0
 [Worker08] parallel.parallel_load: Records: 1780564 Deleted: 0 Skipped: 0 Warnings: 0

100% (8.06 GB / 8.06 GB), 535.17 KB/s
File '/usr/local/mysql/mysql-files/load_data.csv' (8.06 GB) was imported in 6 min 30.0411 sec at 18.81 MB/s
Total rows affected in parallel.parallel_load: Records: 14244516 Deleted: 0 Skipped: 0 Warnings: 0

Yes, It took only 6 minutes and 30 seconds . This is 6x faster than the single threaded method and improved the speed tremendously.

Options needs to be taken care :

Below are the important options which involved in the performance of effective data loading. Every thresholds should be provided with the optimal values based on the available system resources ( CPU / RAM / Disk IOPS ) else it can degrade the performance.

  • Threads
  • BytesPerChunk
  • MaxRate

Threads : 

You can define the number of parallel threads to process the data from the input file to the target server. The default value is 8 threads

BytesPerChunk :

This defines the size of the chunk for each LOAD DATA call . All the threads will process the separate chunk during the operation . We can define the threshold based on the available core and file size .

MaxRate:

The maximum limit on data throughput in bytes per second per thread. Use this option if you need to avoid saturating the network or the I/O or CPU for the client host or target server.

Hope this blog helps to identify the difference between the existing load data operation and latest MySQL Shell utility parallel data loading operation . At Mydbops, We are keep testing the new things on MySQL and related tools, will be back soon with an exciting blog soon.

Featured image by Joao Marcelo Marques on Unsplash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s