pt-archiver: PK index is not always efficient
pt-archiver probably requires unique index to be able to ascend and archive the rows in "chunks". pt-archiver always picks primary key regardless If other unique keys in source table is efficient enough. Due to this, it will slow down entire job process. Below is test case:
CREATE TABLE archive_test (
ID INT UNSIGNED NOT NULL AUTO_INCREMENT,
UID INT UNSIGNED NOT NULL DEFAULT '0',
NAME CHAR(10) NOT NULL,
PHONE INT UNSIGNED DEFAULT '0',
PRIMARY KEY (`ID`),
UNIQUE KEY `UID_IDX` (`UID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I executed pt-archiver with --dry-run option to print queries.
pt-archiver --user=irfan --password=****** --source h=localhost,
--where "UID IN (6508962,
--no-delete --limit=10000 \
--statistics --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `id` LIMIT 10000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `id` LIMIT 10000
And below is equivalent EXPLAIN of above SELECT & it picked PRIMARY key with range scan.
mysql> EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `id` LIMIT 10000;
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | archive_test | range | PRIMARY | PRIMARY | 4 | NULL | 53547 | Using where |
+----+-
1 row in set (0.00 sec)
Below, I forced pt-archiver to use other unique key in table (UID_IDX) as per --where criteria.
pt-archiver --user=irfan --password=****** --source h=localhost,
--where "UID IN (6508962,
--no-delete --limit=10000 \
--statistics --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `uid` LIMIT 10000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `uid` LIMIT 10000
And here is equivalent EXPLAIN output of above produced SELECT for archival. And as per EXPLAIN it can be verified that UID_IDX unique index here is far efficient then PRIMARY key.
mysql> EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id`,`uid`
WHERE (UID IN (6508962,
ORDER BY `uid` LIMIT 10000;
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | archive_test | range | UID_IDX | UID_IDX | 4 | NULL | 5 | Using index condition |
+----+-
1 row in set (0.00 sec)
Either pt-archiver should be smart enough to pick right unique index as PK is not always efficient as in this case as per query execution plan or there should be some option to pt-archiver to ignore PK so tool can pick other unique index.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Muhammad Irfan
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by