记一次MySql死锁排查

排查记录

业务

用户通过访问他人的个人主页删除访问用户-被访用户记录,并重新插入。(为什么删除再插入而不是update,由于此系统的另一套数据库里的访客不存在 customerId, targetCustomerId 的唯一索引, 所以肯定会有脏数据. 为了兼容,这里直接删掉规避更新异常)

  • 伪代码
    //外层Async调用
    @Transactional(rollbackFor = Exception.class)
    @Override
    public void saveOrUpdate(Long customerId, Long visitCustomerId) {
        deleteVisit(customerId, visitCustomerId);
        customerVisitDao.insertVisit(customerId, visitCustomerId);
    }

(伪)表结构

CREATE TABLE `customer_visit` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间',
  `customer_id` bigint(20) NOT NULL DEFAULT '-1' COMMENT '访问用户',
  `visit_customer_id` bigint(20) NOT NULL DEFAULT '-1' COMMENT '被访问用户',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_customer_id_visit_customer_id` (`customer_id`,`visit_customer_id`),
  KEY `index_customer_id` (`customer_id`),
  KEY `index_visit_customer_id` (`visit_customer_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='访客记录';

分析死锁原因

通过show engine innodb status查看最近一次的死锁日志

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-08 22:58:23 7f8d2fbff700
//事务1 编号694322167 活跃0.005s 事务状态:根据索引读取数据
*** (1) TRANSACTION: 

TRANSACTION 694322167, ACTIVE 0.005 sec starting index read
//事务1使用一个表 表锁1
mysql tables in use 1, locked 1
//等待锁链表的长度为2 当前事务持有的行记录锁/gap(间隙锁)1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
LOCK BLOCKING MySQL thread id: 5828505 block 5835430
MySQL thread id 5835430, OS thread handle 0x7f8d6afff700, query id 4891428548 192.168.0.47 web_user updating
//事务1正在等待锁的sql
DELETE FROM customer_visit
        where
        customer_id = 735***68
        and
        visit_customer_id =624***90
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
//事务1 正在等待表customer_visit上索引uniq_customer_id_visit_customer_id的X锁(排他锁)-记录锁 
RECORD LOCKS space id 47 page no 25368 n bits 616 index `uniq_customer_id_visit_customer_id` of table `huayan`.`customer_visit` trx id 694322167 lock_mode X locks rec but not gap waiting
Record lock, heap no 502 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 8; hex 800000000462ed68; asc      b h;;
 1: len 8; hex 8000000003b97a96; asc       z ;;
 2: len 8; hex 8000000000eb5fb1; asc       _ ;;
 

//事务2 编号694322166 活跃0.005s 事务状态:插入数据
*** (2) TRANSACTION:
TRANSACTION 694322166, ACTIVE 0.005 sec inserting
//事务2使用一个表 表锁1
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
MySQL thread id 5828505, OS thread handle 0x7f8d2fbff700, query id 4891428551 192.168.0.48 web_user update
insert customer_visit(customer_id,visit_customer_id)
      values (735***68,624***90)

//事务2 持有表customer_visit上索引uniq_customer_id_visit_customer_id的X锁 记录锁
//由于是RC隔离模式下的基于唯一索引的等值查询,会申请一个记录锁
//此处是事务2通过delete fromcustomer_visit where customer_id = 735***68 and visit_customer_id =624***90申请的锁
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 47 page no 25368 n bits 616 index `uniq_customer_id_visit_customer_id` of table `huayan`.`customer_visit` trx id 694322166 lock_mode X locks rec but not gap
Record lock, heap no 502 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 8; hex 800000000462ed68; asc      b h;;
 1: len 8; hex 8000000003b97a96; asc       z ;;
 2: len 8; hex 8000000000eb5fb1; asc       _ ;;
 
//事务2 正在申请S锁(共享锁) 
//此处是事务2通过insert customer_visit(customer_id,visit_customer_id) values (735***68,624***90) 申请的
//insert语句在普通情况下是会申请排他锁,也就是X锁,但是这里出现了S锁。这是因为a字段是一个唯一索引
//所以insert语句会在插入前进行一次duplicate key的检查,为了使这次检查成功,需要申请S锁防止其他事务对a字段进行修改
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 47 page no 25368 n bits 616 index `uniq_customer_id_visit_customer_id` of table `huayan`.`customer_visit` trx id 694322166 lock mode S waiting
Record lock, heap no 502 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 8; hex 800000000462ed68; asc      b h;;
 1: len 8; hex 8000000003b97a96; asc       z ;;
 2: len 8; hex 8000000000eb5fb1; asc       _ ;;

综上,推测出MySQL死锁原因: 对于同一个的字段的锁申请是需要排队的,针对uniq_customer_id_visit_customer_id索引,T2 insert申请的S锁之前,T1的delete在申请X锁,而T1的X锁又在等待T2 Delete申请的X锁释放,T2的S锁在等待T1的X锁申请,形成循环等待,导致死锁。

并发的情况下,发生死锁情况

事务1 事务2
begin
DELETE FROM customer_visit where customer_id = 1001 and visit_customer_id =1002[执行成功,事务2占有1001-1002uniq的X锁,类型为记录锁]
begin
DELETE FROM customer_visit where customer_id = 1001 and visit_customer_id =1002[事务1希望申请1001-1002uniq的X锁,事务2已经申请到了,事务wait,X锁申请进入锁请求队列]
Deadlock insert customer_visit(customer_id,visit_customer_id) values (1001,1002)[事务2需要申请1001-1002uniq的S锁以便检查duplicate key,排在事务1的1001-1002uniq的X之后,形成循环等待 事务1等待事务2commit,事务2等待事务1commit,造成死锁]

解决

消除uniq_customer_id_visit_customer_id的X锁等待,在事务delete之前,先进行select查询是否存在记录,不存在则不进行delete操作,避免事务获取到X锁,避免循环等待。

另一种死锁情况(非主键索引更新引起的死锁)

业务

数据库表相比情况一缺少了uniq_customer_id_visit_customer_id这个唯一索引(主要是历史问题导致的,一开始就没有唯一索引,后续已经产生了很多脏数据不好添加索引),业务逻辑是相同的。

非主键索引行锁相关

关于MySql行锁

行级锁并不是直接锁记录,而是锁索引,如果一条SQL语句用到了主键索引,mysql会锁住主键索引;如果一条语句操作了非主键索引,mysql会先锁住非主键索引,再锁定主键索引。

依据索引执行DELETE/UPDATE的执行步骤

  1. 由于用到了非主键索引,首先需要获取index_customer_id|index_visit_customer_id上的行级锁(锁非聚簇索引)
  2. 根据主键进行更新,所以需要获取主键上的行级锁(锁聚簇索引)
  3. 更新完毕,提交并释放所有的锁

语句分析EXPLAIN

发现执行的时候使用的type是index merge。在mysql5.0之前,一个表仅仅能使用一个索引,从5.1开始,引入了 index merge 优化技术,对同一个表可以使用多个索引分别进行条件扫描。 使用index merge的情况下 ,update/delete 需要对多个非主键索引相继获取锁,再获取主键上的锁。

并发情况下,死锁发生

在并发情况下,可能会出现

事务1 事务2
begin
DELETE FROM customer_visit where customer_id = 1003 and visit_customer_id =1002[走主键索引,获取pk锁]
begin
DELETE FROM customer_visit where customer_id = 1001 and visit_customer_id =1002[走非主键索引:获取index锁]
Deadlock 事务2等待获取index锁,事务1等待获取pk锁,形成循环等待,导致死锁