RPM

Ensure that solvedb lookups scale

Registered by Jeff Johnson on 2010-09-07

There are several scaling issues with solvedb's:

1) the "best match" policy forces all matches to be examined
    A "first-found" policy to use the first matching solution is one solution.
    A "nearest match" policy for intra-solvedb affinity (i.e. prefer
    answers from same solvedb) needs to be attempted as well.

2) failed searches MUST loop over all solvedb's
A Bloom filter tied to each solvedb avoids unnecessary lookup's.

3) (generation) only Providename and Filepaths indices are needed/used currently.
Certain large per-file indices setups SHOULD be avoided. Perhaps just creating
Packages, and then relying on lazy index creation?

4) tuning for solvedb's is currently defaulted (and DB_CONFIG setup is manual)
    (bdb) Setting mp_mmapsize= 25% of available memory is a win.
    (bdb generate) Another win is using "nofsync". O_DIRECT should be looked at too.
    (bdb generate) Another win is "private"to disable locking overhead

5) lookups in multiple solvedb's might benefit from multi-threading and/or map/reduce.

Blueprint information

Status:: Started

Approver:: Jeff Johnson

Priority:: Medium

Drafter:: None

Direction:: Approved

Assignee:: Jeff Johnson

Definition:: Discussion

Series goal:: Accepted for 5.3

Implementation:: Good progress

Milestone target:: 5.3.6

Started by: Jeff Johnson on 2010-09-07

Related branches

Related bugs

Sprints

Whiteboard

Investigated: Pokylinux solvedb generation for ~5000 pkgs went from 24min to 2min after some tuning fiddle-ups.
Enabling O_DIRECT is slower than not enabling. There's a /proc "swappiness" (iirc) tunable instead.
DIsabling with "nofsync" appears to be the biggest win still.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.

RPM