Security: Data Masking in MySQL 8.0 server

Data security plays a major role in current age. Privacy matters a lot. Data masking is one of the Key Features when comes to security.

  • In MySQL community versions if you want to mask your data, You can go with a Maxscale load balancer.
  • They introduced a new masking filter on the Maxscale 2.1 version.
  • My colleague Prasanth written about column-level data masking. For reference click here Column Level Data Masking.
  • In MySQL 8.0 Enterprise they have introduced a new security feature called data masking.
  • This feature was introduced in MySQL 8.0.13 Enterprise Edition and it provides data masking and de-identification capabilities.

What is Masking ?

  • Transformation of existing data to mask it and remove identifying characteristics, for example, changing all digits of a credit card number but the last four to ‘X’ characters.
  • Generation of random data, such as email addresses and payment card numbers.

Diagram_1.jpg

In MySQL 8.0 they have introduced 14 functions,

1) gen_blacklist
2) gen_dictionary
3) gen_dictionary_drop
4) gen_dictionary_load
5) gen_range
6) gen_rnd_email
7) gen_rnd_pan
8) gen_rnd_ssn
9) gen_rnd_us_phone
10) mask_inner
11) mask_outer
12) mask_pan
13) mask_pan_relaxed
14) mask_ssn

Plugin Installation :

root@localhost:(none)>install plugin data_masking SONAME 'data_masking.so';
Query OK, 0 rows affected (0.10 sec)
  • In MySQL 8.0 there were different types of masking functions are available.
  • I have categorised those functions based on their usage.

Architecture Diagram :

Copy-of-Untitled-Project.jpg

General-Purpose Functions:

  • mask_inner
  • mask_outer

mask_inner :

  • It will mask the interior part of a string, We can specify the number of strings that needs to be masked.
Example :
root@localhost:mydbops_test>select mask_inner(aadhar_number,1,2) as aadhar_number,mask_inner(status,2,0) as status from aadhar_details;
+-----------------+--------+
| aadhar_number   | status |
+-----------------+--------+
| 1XXXXXXXXXXXX78 | ACXXXX |
| 9XXXXXXXXXXX78  | ACXXXX |
+-----------------+--------+
2 rows in set (0.01 sec)
  • In the above example, I have masked the (aadhar_number & status) column using mask_inner function. I have passed the 2 arguments.
  • The first argument indicates margin1. It will mask the inner part of the data.
  • The second argument indicates margin2. It will mask part of the data.
Note :
  • For using this mask_inner function you need minimum 3 characters.

mask_outer :

  • It will mask the left and right ends of a string. We can specify the number of strings that needs to be masked.
Example :
root@localhost:mydbops_test>select mask_outer(aadhar_number,3,2) as aadhar_number,mask_outer(status,2,2) as status from aadhar_details;
+-----------------+--------+
| aadhar_number   | status |
+-----------------+--------+
| XXX2334455667XX | XXTIXX |
| XXX883445566XX  | XXTIXX |
+-----------------+--------+
2 rows in set (0.00 sec)
  • In the above example, I have masked the (aadhar_number & status) column using mask_outer function.

I have passed the 2 arguments.

  • The first argument indicates margin1. It will mask the data from the beginning.
  • The second argument indicates margin2. It will mask the data from the end of the row.

Special-Purpose :

  • mask_pan
  • mask_pan_relaxed
  • mask_ssn

mask_pan :

  • The mask span function it will mask all the values from passed argument except the last four-digit.
Example :
root@localhost:mydbops_test>select mask_pan(aadhar_number),mask_pan(user_identity) from aadhar_details;
+-------------------------+-------------------------+
| mask_pan(aadhar_number) | mask_pan(user_identity) |
+-------------------------+-------------------------+
| XXXXXXXXXXX6778         | XXXXXXwxrin             |
| XXXXXXXXXX66201         | XXXXXXXrstg             |
+-------------------------+-------------------------+
2 rows in set (0.00 sec)
Note :
  • If the argument is shorter than required, it is returned unchanged value.
  • For using this (mask_pan) function, We need a minimum of 5 characters.
Example :
root@localhost:mydbops_test>select mask_pan('data');
+------------------+
| mask_pan('data') |
+------------------+
| data             |
+------------------+
1 row in set (0.00 sec)

mask_pan_relaxed :

  • The mask_pan_relaxed function will mask other characters except for first six digits and last 4 digits.
Example :
root@localhost:mydbops_test>select mask_pan_relaxed(photo) from aadhar_details;
+-------------------------------+
| mask_pan_relaxed(photo)       |
+-------------------------------+
| https:XXXXXXX.jpg             |
| https:XXXXXXXXXXXXXXXXXXX.jpg |
+-------------------------------+
2 rows in set (0.00 sec)
Note :
  • The string must be a suitable length,(ie.. string must be 11 characters long).Otherwise, this mask_pan_relaxed function did not check the string.

mask_ssn :

  • The mask_ssn is a Social Security number that returns the number except for last 4 digits. Other numbers are replaced by “X” characters.
Example :
root@localhost:mydbops_test>select mask_ssn('90963692211');
+-------------------------+
| mask_ssn('90963692211') |
+-------------------------+
| XXX-XX-2211             |
+-------------------------+
1 row in set (0.00 sec)

Generating Random Data with Specific Characteristics :

  • One of the nice functions of MySQL Data Masking is the ability to generate business-relevant datasets.
  • Because it is not always possible to test/simulate your application on your real dataset (indeed playing with the customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Several functions are available,

gen_range() : returns a random integer selected from a given range.

gen_rnd_email() : returns a random email address in the example.com domain.

gen_rnd_pan() : returns a random payment card Primary Account Number.

gen_rnd_us_phone() : returns a random U.S. phone number.

Masking Data for Customer Identification :

  • This type of masking function is generating random data from the dictionary file.
  • Here the main advantage is we can load the dictionary files manually.
  • When a dictionary is loaded, it becomes part of the dictionary registry and is assigned a name to be used by other dictionary functions.
  • Dictionaries are loaded from plain text files containing one term per line. Empty lines are ignored. To be valid, a dictionary file must contain at least one non-empty line.

gen_dictionary :

  • It returns some random data from a dictionary.
Example :
mysql> select gen_dictionary('name') from aadhar_details;
+------------------------+
| gen_dictionary('name') |
+------------------------+
| La Trinité             |
| Fonds-Saint-Denis      |
+------------------------+
2 rows in set (0.00 sec)

gen_dictionary_load :

  • Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
  • In order to use data from a dictionary, we must first load the data.
Sample data :
[root@mydbopslabs26 mysql-files]# cat random_dictionary.txt
Basse-Pointe
Bellefontaine
Case-Pilote
Ducos
Fonds-Saint-Denis
Fort-de-France
Grand'Rivière
Gros-Morne
L'Ajoupa-Bouillon
La Trinité
  • We can load the dictionary files using (gen_dictionary_load) function.
Example :
mysql> SELECT gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name');
+-----------------------------------------------------------------------+
| gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name')  |
+-----------------------------------------------------------------------+
| Dictionary load success                                               |
+-----------------------------------------------------------------------+
1 row in set (0.00 sec)
Note :
  • To loading the dictionary file to MySQL you need to put the file under (secure_file_priv) directory.Otherwise, MySQL will not allow loading the dictionary file.
mysql> SELECT gen_dictionary_load('/home/mydbops/dictionary','user_identity');
+-----------------------------------------------------------------+
| gen_dictionary_load('/home/mydbops/dictionary','user_identity') |
+-----------------------------------------------------------------+
| Dictionary load error: dictionary file not readable             |
+-----------------------------------------------------------------+
1 row in set (0.00 sec)

How to use these functions in a production use case?

  • We can achieve this by creating views. While creating the views we can specify what are the columns needs to mask.
Sample Table:
mysql> desc aadhar_details;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| aadhar_number | bigint(20)   | NO   |     | NULL    |                |
| name          | varchar(16)  | NO   |     | NULL    |                |
| mobile        | int(11)      | YES  |     | NULL    |                |
| address       | varchar(150) | NO   |     | NULL    |                |
| email         | varchar(64)  | YES  |     | NULL    |                |
| photo         | blob         | YES  |     | NULL    |                |
| user_identity | blob         | YES  |     | NULL    |                |
| status        | varchar(16)  | YES  |     | NULL    |                |
| create_time   | timestamp    | NO   |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
10 rows in set (0.03 sec)

For the above table, I am masking a few columns, ie…

  • aadhar_number – The gen_range function will generate some random numbers.
  • mobile – The gen_rnd_us_phone function will generate random numbers in U.S standard format.
  • name & address – The gen_dictionary function will load random dictionary data.
  • email – The gen_rnd_email function will generate random email id’s.
  • photo – The mask_inner function will mask the inner part of the selected column.

Creating View :

mysql> create view mydbops_masking
as
select
  gen_range(900000000, 999999999) as aadhar_number,
  gen_rnd_us_phone() as mobile,
  gen_dictionary('name') as name,
  gen_dictionary('address') as address,
  gen_rnd_email() as email,
  mask_inner(photo,1,5) as photo
from aadhar_details;
Query OK, 0 rows affected (0.03 sec)

Now the data looks like below,

mysql> select aadhar_number,mobile,name,email,photo from mydbops_masking;
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
| aadhar_number | mobile        | name                                                                   | email                | photo                         |
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
| 976896841     | 1-555379-4677 | ˙∆√∆ç©√˙∫∫∆\∫∆\©¨∆∫˙                                                   | lQGeD5ZA@example.com | hXXXXXXXXXXXl.jpg             |
| 990742763     | 1-555330-5061 | ∆˙©ç†˙∆˚˙©©¨˙©ƒ∂†®ƒ©˙∆˜µ∆∫˙¨©ƒç                                        | dSODymDW@example.com | hXXXXXXXXXXXXXXXXXXXXXXXe.jpg |
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
2 rows in set (0.00 sec)

Conclusion :

  • This is a really awesome feature in MySQL 8.0. But this feature is only available for Enterprise MySQL.
  • In Percona MySQL 8.0 they have implemented this same feature in open source. And they are expanding the Data-Masking plugin to implementing new features.

Featured image by H Shaw on Unsplash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s