Ensure that solvedb lookups scale
There are several scaling issues with solvedb's:
1) the "best match" policy forces all matches to be examined
A "first-found" policy to use the first matching solution is one solution.
A "nearest match" policy for intra-solvedb affinity (i.e. prefer
answers from same solvedb) needs to be attempted as well.
2) failed searches MUST loop over all solvedb's
A Bloom filter tied to each solvedb avoids unnecessary lookup's.
3) (generation) only Providename and Filepaths indices are needed/used currently.
Certain large per-file indices setups SHOULD be avoided. Perhaps just creating
Packages, and then relying on lazy index creation?
4) tuning for solvedb's is currently defaulted (and DB_CONFIG setup is manual)
(bdb) Setting mp_mmapsize= 25% of available memory is a win.
(bdb generate) Another win is using "nofsync". O_DIRECT should be looked at too.
(bdb generate) Another win is "private"to disable locking overhead
5) lookups in multiple solvedb's might benefit from multi-threading and/or map/reduce.
Blueprint information
- Status:
- Started
- Approver:
- Jeff Johnson
- Priority:
- Medium
- Drafter:
- None
- Direction:
- Approved
- Assignee:
- Jeff Johnson
- Definition:
- Discussion
- Series goal:
- Accepted for 5.3
- Implementation:
- Good progress
- Milestone target:
- 5.3.6
- Started by
- Jeff Johnson
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Investigated: Pokylinux solvedb generation for ~5000 pkgs went from 24min to 2min after some tuning fiddle-ups.
Enabling O_DIRECT is slower than not enabling. There's a /proc "swappiness" (iirc) tunable instead.
DIsabling with "nofsync" appears to be the biggest win still.