We all tried various alternative methods for modifying the table structure, but pt-online-schema-change (pt-osc) is the most convenient and preferred method for performing the alter online. It has more granular control too. But it may lead to data loss if proper precautionary steps are not taken care of.
In this blog, we are going to modify a column to a unique key using pt-osc, below I have shared the table structure.
mysql> show create table test\G * 1. row * Table: test Create Table: CREATE TABLEtest
(Personid
int(11) NOT NULL AUTO_INCREMENT,LastName
varchar(255) NOT NULL,FirstName
varchar(255) DEFAULT NULL,Age
int(11) DEFAULT NULL, PRIMARY KEY (Personid
) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1 1 row in set (0.00 sec)
I have inserted the data of 1000 rows using mysql_random_data_loader.
[vagrant@centos11 ~]$ ./mysql_random_data_load db test 1000
INFO[2021-01-02T09:44:16Z] Starting
0s [====================================================================] 100%
INFO[2021-01-02T09:44:16Z] 1000 rows inserted
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.01 sec)
Now I am voluntarily inserting 4 duplicate values into the table.
mysql> insert into test (LastName,FirstName,Age) values ('GR','Praveen',23); Query OK, 1 row affected (0.00 sec) mysql> insert into test (LastName,FirstName,Age) values ('GR','Praveen',23); Query OK, 1 row affected (0.00 sec) mysql> insert into test (LastName,FirstName,Age) values ('GR','Praveen',23); Query OK, 1 row affected (0.00 sec) mysql> insert into test (LastName,FirstName,Age) values ('GR','Praveen',23); Query OK, 1 row affected (0.00 sec)
mysql> select count(*) from test; +----------+ | count(*) | +----------+ | 1004 | +----------+ 1 row in set (0.01 sec)
Now the total number of records is 1004. These 4 records are manually inserted to simulate non-uniqueness. A composite unique key will be added over the column (FirstName, LastName). As usual, Now I am going to perform the dry-run with pt-osc before performing the actual alter.
[vagrant@centos11 ~]$ pt-online-schema-change D=db,t=test --host=localhost --alter 'ADD UNIQUE (FirstName,LastName)' --recursion-method=none --alter-foreign-keys-method=auto --no-check-replication-filters --chunk-time=2 --dry-run A software update is available: Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 No foreign keys referencedb
.test
; ignoring --alter-foreign-keys-method. Starting a dry run.db
.test
will not be altered. Specify --execute instead of --dry-run to alter the table. You are trying to add an unique key. This can result in data loss if the data is not unique. Please read the documentation for the --check-unique-key-change parameter. You can check if the column(s) contain duplicate content by running this/these query/queries: SELECT IF(COUNT(DISTINCT FirstName, LastName) = COUNT(*), 'Yes, the desired unique index currently contains only unique values', 'No, the desired unique index contains duplicated values. There will be data loss' ) AS IsThereUniqueness FROMdb
.test
; Keep in mind that these queries could take a long time and consume a lot of resources Dry run complete.db
.test
was not altered.
Case 1 ) With –no-check-unique-key-change
The option –no-check-unique-key-change sounds like an easy way to overcome this issue and let us try it out.
[vagrant@centos11 ~]$ pt-online-schema-change D=db,t=test --host=localhost --alter 'ADD UNIQUE (FirstName,LastName)' --recursion-method=none --alter-foreign-keys-method=auto --no-check-replication-filters --chunk-time=2 --no-check-unique-key-change --execute perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). No slaves found. See --recursion-method if host centos11 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 No foreign keys referencedb
.test
; ignoring --alter-foreign-keys-method. Alteringdb
.test
… Creating new table… Created new table db._test_new OK. Altering new table… Altereddb
._test_new
OK. 2021-01-02T09:59:04 Creating triggers… 2021-01-02T09:59:04 Created triggers OK. 2021-01-02T09:59:04 Copying approximately 1000 rows… 2021-01-02T09:59:04 Copied rows OK. 2021-01-02T09:59:04 Analyzing new table… 2021-01-02T09:59:04 Swapping tables… 2021-01-02T09:59:04 Swapped original and new tables OK. 2021-01-02T09:59:04 Dropping old table… 2021-01-02T09:59:04 Dropped old tabledb
._test_old
OK. 2021-01-02T09:59:04 Dropping triggers… 2021-01-02T09:59:04 Dropped triggers OK. Successfully altereddb
.test
.
The alter looks successful. Now let us cross-validate the data.
mysql> show create table test\G * 1. row * Table: test Create Table: CREATE TABLEtest
(Personid
int(11) NOT NULL AUTO_INCREMENT,LastName
varchar(255) NOT NULL,FirstName
varchar(255) DEFAULT NULL,Age
int(11) DEFAULT NULL, PRIMARY KEY (Personid
), UNIQUE KEYFirstName
(FirstName
,LastName
) ) ENGINE=InnoDB AUTO_INCREMENT=1006 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select count(*) from test; +----------+ | count(*) | +----------+ | 1001 | +----------+ 1 row in set (0.00 sec)
Seems like three rows are missing. That is very bad in a production environment to lose any record.
Case 2)
Let us the prescribed solution in logs of –dry-run
SELECT IF(COUNT(DISTINCT FirstName, LastName) = COUNT(*), 'Yes, the desired unique index currently contains only unique values', 'No, the desired unique index contains duplicated values. There will be data loss' ) AS IsThereUniqueness FROMdb
.test
;
The above query will fetch the uniqueness of the data and shares the possibility of data loss.
mysql> SELECT IF(COUNT(DISTINCT FirstName, LastName) = COUNT(*), 'Yes, the desired unique index currently contains only unique values', 'No, the desired unique index contains duplicated values. There will be data loss' ) AS IsThereUniqueness FROMdb
.test
;
+----------------------------------------------------------------------------------+
| IsThereUniqueness |
+----------------------------------------------------------------------------------+
| No, the desired unique index contains duplicated values. There will be data loss |
+----------------------------------------------------------------------------------+
1 row in set (0.01 sec)
By this, we can know there is a duplicate value in this table. We can find out what is the duplicate value in this table. And use the below query to find out the distinct count instead of the query shared by the pt-osc. Because query shared by them will use the tmp table and it may let to server crash as well.
mysql> select count(DISTINCT FirstName, LastName) from test; +-------------------------------------+ | count(DISTINCT FirstName, LastName) | +-------------------------------------+ | 1001 | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT * from test a join (SELECT LastName,FirstName,Age,count() from test group by LastName,FirstName HAVING count() > 1 ) b ON a.FirstName = b.FirstName AND a.LastName = b.LastName and a.age = b.age ORDER BY a.FirstName; +----------+----------+-----------+------+----------+-----------+------+----------+ | Personid | LastName | FirstName | Age | LastName | FirstName | Age | count(*) | +----------+----------+-----------+------+----------+-----------+------+----------+ | 1001 | GR | Praveen | 23 | GR | Praveen | 23 | 4 | | 1006 | GR | Praveen | 23 | GR | Praveen | 23 | 4 | | 1007 | GR | Praveen | 23 | GR | Praveen | 23 | 4 | | 1008 | GR | Praveen | 23 | GR | Praveen | 23 | 4 | +----------+----------+-----------+------+----------+-----------+------+----------+ 4 rows in set (0.00 sec)
These are the duplicate value and got data loss of these values while performing pt-OSC. Because pt-OSC will insert the data to the new table with Insert ignore, so the duplicate value is get ignored. So we need to take care of these duplicate values before performing adding the unique key to the table.
Conclusion:-
I hope from this you can get the details to add a unique key to the table in a safer manner. But we need to take care of the tmp table as well. Try to use the query shared above instead of the standard query, so we can avoid unexpected server crashes due to tmp.