Speed up updates and upgrades
Installing/updating with apt is too serialized (mostly because it is transaction-based), packages are first downloaded, then extracted, then configured. In some of the cases some commands (man-db-update, initramfs-update, etc.) are executed multiple times while doing an install/
The bug suggesting something similar is https:/
The Debian GSoC project "APT/Dpkg Transaction Ordering for Safety and Performance"
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- None
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Discussion
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Whiteboard
Actions:
- [evfool] gather data (e.g. from the auto-upgrade tester) (number of times things are run, time it takes to run them, time spent for downloading, unpacking, configuring, postinst triggers) on what part of the upgrade takes how long
- [evfool] data collection using dpkg-log, to see what is possible using that
- [] involve the community, set up wiki page to help them on how to send us profiling data, and use the wikipage to publish the result data
- [mvo] test using the eatmydata package to see how much fsync() and friends contribute to the time
- [mvo] test different options (--no-triggers, default, etc) with autoupgradetester to see the differences it makes (see man apt.conf - dpkg trigger usage)
- [cjwatson] optimize the initramfs generation (first figure out what takes time)
- [mvo?] make unattended-upgrade use aptdaemon(?) - needs a "future state" concept
- [cjwatson] profile mandb some more and improve its performance
- [cjwatson] profile grub-mount and improve its performance
-------
Notes from the session:
updates/upgrades are slow
why are these slow?
- triggers solve problems of some repeating actions
- we don't have a good data on what takes long and why
- someone should mine data (from AutoUpgradetester, which has timestamps) to see which actions take most time
- ldconfig has triggers, it would be possible to run dpkg without triggers, but currently it has a bug with predepends
- the most obvious and first step would be parallelizing downloading and installing
- we do have some preliminary data to show what packages are contributing with how much time, but we will need more data
- we have data for upgrade from maverick to natty, with time spent processing per package, e.g. 87 seconds man-db, 58 seconds dpkg-exec
- man-db would probably be more efficient if it would be run fewer times
- initramfs is built a number of times, building kernel images, headers
- would be interesting to measure the time it takes for pkgs to install not counting the disk IO times (using the eatmydata package)
- how much does the download take? debdelta was a separate discussion, which should make a huge difference, but debdelta uses CPU, so parallelizing download and dpkg will not work with debdelta
- GSoC apt-ordering project is about optimizing ordering for least broken packages at any time or minimal dpkg invocations for most secure updates, but this is optimization for a different use-case
- predepends is applied incorrectly sometimes
- it would worth trying to defer configure, only configure on predependencies
- dpkg with many packages consumes a lot of memory and gets slow (old low-memory mode may have better cache usage due to better locality of reference)
- if status file parsing is a significant factor, perhaps we should consider reviving ijackson's flex scanner experiment
- test more of universe (maybe in multiple steps)
- the problem stands only for upgrading, because updating in the background properly wouldn't affect the user