Prototype use of traces for QEMU speed improvements
This blueprint is for prototyping a more significant change to QEMU's internals which might produce a better perf improvement or lay the foundation for future improvements by giving scope for more advanced optimisations to work. The issue we're trying to address is that TCG basic blocks are typically very short, because they end at any branch. This means that there's not much potential for optimisations to actually kick in. So we want to prototype some sort of 'trace' setup which allows the codegen and optimisation to work on a larger chunk of code. For a month's worth of work we'd hope to come out with a prototype suitable for posting upstream as an 'RFC' patchset (for example, we might make any required frontend changes only to the ARM frontend, and backend changes only to the x86 backend). Actually creating a completely mergeable patchset would be a separate blueprint and probably another month.
Some other people in QEMU upstream are already looking at generic TCG speed improvements (eg Aurelien, Kirill), so we need to make sure we cooperate here.
Blueprint information
- Status:
- Not started
- Approver:
- Michael Hope
- Priority:
- Not
- Drafter:
- Peter Maydell
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Approved
- Series goal:
- None
- Implementation:
- Deferred
- Milestone target:
- backlog
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Work Items
Work items:
Become familiar with QEMU's current codegen approach: TODO
Sketch out a design for adding traces: TODO
Propose upstream, collect feedback: TODO
Implement prototype 1: TODO
Implement prototype 2: TODO
Implement prototype 3: TODO
Implement prototype 4: TODO
Benchmark and instrument to see how effective it is: TODO
Tweaks based on benchmarking results: TODO
Submit RFC patchseries upstream: TODO
Dependency tree
* Blueprints in grey have been implemented.