Investigating a MySQL dead lock

I was involved in one of the development support for MySQL. Here is an interesting case

The application needs a Sequence generator as it was not present inbuilt with MySQL ( MariaDB will have it in 10.3 ) they have made sequence generator function with ‘select for update‘ statement. But it tends to cause more locking at high concurrency with their XtraDB Cluster deployment.

To get rid of this , the sequence generation were made as direct Update queries and sequence is fetched using Select @next.

UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'dealer';

It was supposed to fix the problem but it does not in the real world test scenario. The table structure is below

show create table EO_PK_TABLE\G
*************************** 1. row ***************************
       Table: EO_PK_TABLE
Create Table: CREATE TABLE `EO_PK_TABLE` (
  `NAME` char(40) DEFAULT NULL,
  `PK` decimal(11,0) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Application Exception: 

<MySQLExpression: "UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact';" withBindings: >:
    Next exception:SQL State:41000 
-- error code: 1205 -- msg: Lock wait timeout exceeded; try restarting transaction"}

Let us have a look at the InnoDB Engine status

---TRANSACTION 2821395544, ACTIVE 212 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 3397, OS thread handle 139772570343168, query id 195064 10.0.5.128 appuser updating
UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact'
------- TRX HAS BEEN WAITING 212 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1087 page no 3 n bits 248 index PRIMARY of table `ccbuser`.`EO_PK_TABLE` trx id 2821395544 lock_mode X locks rec but not gap waiting

The processlist snap shot

Dead lock

There is a lock over the index , but what is causing the lock and why it happens ?

---TRANSACTION 2821458353, ACTIVE 135 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 5455, OS thread handle 139772578862848, query id 590796 10.0.5.128 appuser Searching rows for update
UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'service'
------- TRX HAS BEEN WAITING 135 SEC FOR THIS LOCK TO BE GRANTED:

 

It is noted that the transaction have UNDO log entries. While Updating the records from application perspective , the sequence generation was supposed to be the first step in the transaction. The UNDO entries further for the queries gives the hint that it is not start of the transaction or there is an open transaction.

1) Setting the ISOLATION LEVEL 
2) Increment the EO_PK_TABLE based on NAME Column.
3)Fetch the next ID 
4)Insert into relevant table 
5)Commit.

 

Query Log for Analysis .

I have enabled the query log on this non-prod environment to get the query write pattern. But at cases the transactions are not committed by the application .After the UPDATE over EO_PK_TABLE  it keeps on Updating the columns in  EO_PK_TABLE.

Let us consider a transaction handled by Query ID 5068.

5068 Query     commit
-- Transaction starts
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'contact'
5068 Query     SELECT @next contactId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'account_vod_thrsh'
5068 Query     SELECT @next id
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'address'
5068 Query     SELECT @next addrId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'service'
5068 Query     SELECT @next servId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'customer_registration'
5068 Query     SELECT @next id
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'location'
5068 Query     SELECT @next locId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bill_delivery'
5068 Query     SELECT @next billDelId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'account'
5068 Query     SELECT @next acctId
5068 Query     set tx_isolation='READ-COMMITTED'
5068 Query     UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bob'
5068 Query     SELECT @next bobId
5068 Query     INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 14:43:16', 65, 'Call Center', 42389538, NULL, '001', NULL, NULL, NULL, NULL, '171130144316899', 'F', 42391292, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130144316899', NULL, '20171130144316912', NULL, NULL, NULL, NULL, '30144316912', '201711', NULL, 'Monthly', 4.4585868E+7)
5068 Query     rollback

 

This makes sense the application is not closing the transaction perfectly. With persistent connection they use the same connection thread without a proper COMMIT. It causes the lock over the records and the dead lock arises. Now we have the root cause of the problem .

As it is non-prod environment I had the luxury to troubleshoot using Query log. But query log on production ENV will come with a high penalty on IO. So it is good to use the Performance Schema inbuilt with MySQL for trouble shooting.

Performance Schema to Troubleshoot:

The Performance schema is used to analyze the transaction history and its relevant info. Here the query ID = 5511 and Transaction ID = 2821461197

*************************** 3. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10911
EVENT NAME: statement/sql/insert
       SQL: INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 17:05:19', 65, 'Call Center', 42392550, NULL, '001', NULL, NULL, NULL, NULL, '171130155238743', 'F', 42394399, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130155238743', NULL, '20171130170520142', NULL, NULL, NULL, NULL, '30170520142', '201711', NULL, 'Monthly', 4.458888E+7)
*************************** 4. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10912
EVENT NAME: statement/sql/rollback
       SQL: rollback
*************************** 5. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10913
EVENT NAME: statement/sql/insert
       SQL: INSERT INTO account(sales_person, do_not_auto_script, direct_acct, mso_id, parental_control, create_date, bu_id, sales_method, acct_id, cancel_reasons_id, tv_dma_id, tax_number, verified_date, status_change_date, external_creation_date, customer_id, attach_bill_cycle, contact_id, compliment_acct, acct_comment, acct_status_id, complaintmode_acct, bill_del_type_id, legacy_id, acct_type_id, tax_exempt, verified, psa_acct_id, bb_acct_id, anonymized, acct_reg_num, email_inv_format, mso_name, verified_channel, offer_type, external_id_suffix, external_id_prefix, active_logins_count, acct_bill_mode, shipping_add_id) VALUES (NULL, 'F', NULL, NULL, NULL, '2017-11-30 17:05:20', 65, 'Call Center', 42392549, NULL, '001', NULL, NULL, NULL, NULL, '171130155032832', 'F', 42394398, 'F', NULL, 1, NULL, NULL, NULL, 2, NULL, NULL, NULL, '171130155032832', NULL, '20171130170520142', NULL, NULL, NULL, NULL, '30170520142', '201711', NULL, 'Monthly', 4.4588879E+7)
*************************** 6. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10914
EVENT NAME: statement/sql/rollback
       SQL: rollback
*************************** 7. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10915
EVENT NAME: statement/sql/update
       SQL: UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'address'
*************************** 8. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10916
EVENT NAME: statement/sql/select
       SQL: SELECT @next addrId
*************************** 9. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10917
EVENT NAME: statement/sql/update
       SQL: UPDATE EO_PK_TABLE SET PK = (@next := PK + 100) WHERE NAME = 'bob'
*************************** 10. row ***************************
PROCESS ID: 5511
    TXN ID: 2821461197
  EVENT_ID: 10918
EVENT NAME: statement/sql/select
       SQL: SELECT @next bobId
10 rows in set (0.00 sec)

 

These information and stats were passed on to development team to fix the code. Once the issue is nailed down perfectly it can be fixed easily. The code was fixed, tested and shipped to production. Now the DB queries scales better .

Advertisements

2 thoughts on “Investigating a MySQL dead lock

  1. For what it is worth, the CREATE SEQUENCE implementation in MariaDB 10.3 skips any transaction overhead in InnoDB. There is no undo logging and no transactional locking inside InnoDB (and therefore no possibility of transactional deadlocks). Access is controlled by locks in the SQL layer as well as by short-term InnoDB rw-locks and mutexes (buffer page lock, redo log mutex).

    Sequences are internally stored as special no-rollback tables inside InnoDB. It would be much harder to disable undo logging if there were any secondary indexes or off-page columns. For sequences, we do not need those, and furthermore, there is only one sequence record in the table, so there will only be one index page.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s