background-statistics
Currently the key distribution statistics is gathered in foreground which makes it hard to maintain fine-grained statistics. The idea is to move statistics gathering process to a background thread. This will allow to have more relevant stats and re-new them more flexibly as new changes come in.
Having background stats collector will allow to implement a more sophisticated stats gathering algorithm. Some ideas for the algorithm:
1. Keep an index-like disk structure (probably in a separate file, a file per key or per field) that would have key1 -> count1; key2 -> count2; ... layout. This would be interpreted as "there are count2 - count1 records between key1 and key2"
2. Approximate the actual value distribution with a well-known distribution. This way we'd need to store only distribution id and several distribution-
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- None
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by