pt-table-checksum TODO
These features were not implemented in 2.0.1 (https:/
Developers: once we decide on a version in which to do some of these features, create a new blueprint like pt-table-
###
### 1. Remove --check-interval
We don't need --check-interval anymore. 1 second is too long when we have potentially targeted a very short chunk time. We can just make the tool sleep for --chunk-time when replicas are lagging.
###
### 2. Extended checksum table columns
Consider adding skipped tables to the checksum table, with no checksums, no rows, no chunks, and skipped=1, so we can find skipped tables with a query on skipped>chunks.
###
### 3. Add --filter to replace some removed features like --modulo and --offset
The filter is Perl code that accepts a hashref of information about each chunk and is compiled into a function, similar to the way pt-query-digest is done. The filter gets a hashref named $chunk, with the following keys:
- db, the database name
- tbl, the table name
- chunk, the chunk number
If the filter returns 1, the chunk is checksummed. If it returns 0, the chunk is skipped, and the SKIPPED column is incremented. If the filter throws an error, the whole table is skipped, and the error is printed.
An example filter to do approximately 1/7th of the table every day:
--filter '($chunk->{chunk} % 7) == (sprintf("%d", time/86400) % 7)'
An example filter to skip a table for some reason:
--filter 'die "Skipping table $chunk-
###
### 4. Automatically avoid false negatives
a) Automatically use --float-precision to avoid false positives. Set --float-precision to a default value of 12. This is TBD. Or, perhaps if the tool is checking --replicate-check (as it should by default), then it can notice out-of-sync chunks and decrease its float precision if there is a float/double column in the table? But that seems silly -- why not just use a lower value to begin with. Let's defer this item and return to it later.
###
### 5. Add safety checks
a. Replication filter
Add replication filter checks similarly to how pt-table-sync does it: if there are any binlog_
b. Table existence
Before checksumming any table, check for its existence on all replicas. If it doesn't exist, skip it with the message "Skipping $db.$tbl because it doesn't exist on $host." This feature is billable to issue 19429.
c. binlog_format on slaves
When recursing to replicas and checking for filters and so forth, also check for binlog_
d. Timezone
Check the timezone for all connections the tool opens, and if any of them doesn't match, stop with an error message indicating that you can solve it by setting --set-vars time_zone=foo. See also bug 912470
e. read_only
Detect whether the server the tool is running on is read_only; if so, it might be run against a replica accidentally; warn and stop.
###
### 6. Add --recheck
This feature needs to be implemented as it was in v1.0, where it looks at the replicate table and re-checksums any chunks found to be different on one or more replicas.
###
### 7. Make the checksum queries use LOW_PRIORITY hint
I think pt-archiver has a --low-priority option or something; see if we can emulate that too. this should be enabled by default, just like innodb_
Blueprint information
- Status:
- Not started
- Approver:
- Baron Schwartz
- Priority:
- Undefined
- Drafter:
- Baron Schwartz
- Direction:
- Approved
- Assignee:
- None
- Definition:
- Discussion
- Series goal:
- None
- Implementation:
- Informational
- Milestone target:
- None
- Started by
- Completed by