I was involved in one of the development support for MySQL. Here is an interesting case
The application needs a Sequence generator as it was not present inbuilt with MySQL ( MariaDB will have it in 10.3 ) they have made sequence generator function with ‘select for update‘ statement. But it tends to cause more locking at high concurrency with their XtraDB Cluster deployment.
To get rid of this , the sequence generation were made as direct Update queries and sequence is fetched using Select @next.
UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'dealer';
It was supposed to fix the problem but it does not in the real world test scenario. The table structure is below
show create table EO_PK_TABLE\G *************************** 1. row *************************** Table: EO_PK_TABLE Create Table: CREATE TABLE `EO_PK_TABLE` ( `NAME` char(40) DEFAULT NULL, `PK` decimal(11,0) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Application Exception:
<MySQLExpression: "UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact';" withBindings: >: Next exception:SQL State:41000 -- error code: 1205 -- msg: Lock wait timeout exceeded; try restarting transaction"}
Let us have a look at the InnoDB Engine status
---TRANSACTION 2821395544, ACTIVE 212 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s) MySQL thread id 3397, OS thread handle 139772570343168, query id 195064 10.0.5.128 appuser updating UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact' ------- TRX HAS BEEN WAITING 212 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1087 page no 3 n bits 248 index PRIMARY of table `ccbuser`.`EO_PK_TABLE` trx id 2821395544 lock_mode X locks rec but not gap waiting
The processlist snap shot
There is a lock over the index , but what is causing the lock and why it happens ?
---TRANSACTION 2821458353, ACTIVE 135 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2 MySQL thread id 5455, OS thread handle 139772578862848, query id 590796 10.0.5.128 appuser Searching rows for update UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'service' ------- TRX HAS BEEN WAITING 135 SEC FOR THIS LOCK TO BE GRANTED:
It is noted that the transaction have UNDO log entries. While Updating the records from application perspective , the sequence generation was supposed to be the first step in the transaction. The UNDO entries further for the queries gives the hint that it is not start of the transaction or there is an open transaction.
1) Setting the ISOLATION LEVEL 2) Increment the EO_PK_TABLE based on NAME Column. 3)Fetch the next ID 4)Insert into relevant table 5)Commit.
Query Log for Analysis .
I have enabled the query log on this non-prod environment to get the query write pattern. But at cases the transactions are not committed by the application .After the UPDATE over EO_PK_TABLE it keeps on Updating the columns in EO_PK_TABLE.
Let us consider a transaction handled by Query ID 5068.
5068 Query commit -- Transaction starts 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact' 5068 Query SELECT @next contactId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'account_vod_thrsh' 5068 Query SELECT @next id 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'address' 5068 Query SELECT @next addrId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'service' 5068 Query SELECT @next servId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'customer_registration' 5068 Query SELECT @next id 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'location' 5068 Query SELECT @next locId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bill_delivery' 5068 Query SELECT @next billDelId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'account' 5068 Query SELECT @next acctId 5068 Query set tx_isolation='READ-COMMITTED' 5068 Query UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bob' 5068 Query SELECT @next bobId 5068 Query INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 14:43:16', 65, 'Call Center', 42389538, NULL, '001', NULL, NULL, NULL, NULL, '171130144316899', 'F', 42391292, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130144316899', NULL, '20171130144316912', NULL, NULL, NULL, NULL, '30144316912', '201711', NULL, 'Monthly', 4.4585868E+7) 5068 Query rollback
This makes sense the application is not closing the transaction perfectly. With persistent connection they use the same connection thread without a proper COMMIT. It causes the lock over the records and the dead lock arises. Now we have the root cause of the problem .
As it is non-prod environment I had the luxury to troubleshoot using Query log. But query log on production ENV will come with a high penalty on IO. So it is good to use the Performance Schema inbuilt with MySQL for trouble shooting.
Performance Schema to Troubleshoot:
The Performance schema is used to analyze the transaction history and its relevant info. Here the query ID = 5511 and Transaction ID = 2821461197
*************************** 3. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10911 EVENT NAME: statement/sql/insert SQL: INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 17:05:19', 65, 'Call Center', 42392550, NULL, '001', NULL, NULL, NULL, NULL, '171130155238743', 'F', 42394399, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130155238743', NULL, '20171130170520142', NULL, NULL, NULL, NULL, '30170520142', '201711', NULL, 'Monthly', 4.458888E+7) *************************** 4. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10912 EVENT NAME: statement/sql/rollback SQL: rollback *************************** 5. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10913 EVENT NAME: statement/sql/insert SQL: INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 17:05:20', 65, 'Call Center', 42392549, NULL, '001', NULL, NULL, NULL, NULL, '171130155032832', 'F', 42394398, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130155032832', NULL, '20171130170520142', NULL, NULL, NULL, NULL, '30170520142', '201711', NULL, 'Monthly', 4.4588879E+7) *************************** 6. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10914 EVENT NAME: statement/sql/rollback SQL: rollback *************************** 7. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10915 EVENT NAME: statement/sql/update SQL: UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'address' *************************** 8. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10916 EVENT NAME: statement/sql/select SQL: SELECT @next addrId *************************** 9. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10917 EVENT NAME: statement/sql/update SQL: UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bob' *************************** 10. row *************************** PROCESS ID: 5511 TXN ID: 2821461197 EVENT_ID: 10918 EVENT NAME: statement/sql/select SQL: SELECT @next bobId 10 rows in set (0.00 sec)
These information and stats were passed on to development team to fix the code. Once the issue is nailed down perfectly it can be fixed easily. The code was fixed, tested and shipped to production. Now the DB queries scales better .
For what it is worth, the CREATE SEQUENCE implementation in MariaDB 10.3 skips any transaction overhead in InnoDB. There is no undo logging and no transactional locking inside InnoDB (and therefore no possibility of transactional deadlocks). Access is controlled by locks in the SQL layer as well as by short-term InnoDB rw-locks and mutexes (buffer page lock, redo log mutex).
Sequences are internally stored as special no-rollback tables inside InnoDB. It would be much harder to disable undo logging if there were any secondary indexes or off-page columns. For sequences, we do not need those, and furthermore, there is only one sequence record in the table, so there will only be one index page.
LikeLiked by 1 person