Performance inside GCC
This session covers non-NEON performance improvements inside GCC. Includes current investigations by Linaro, areas that other groups are working on, and potential long-term topics that could help performance on ARM.
Whiteboard
[It is still ongoing, and I'll update it a little bit in following days]
1. Tune instruction schedulings for ARM
GCC has various options related to instruction scheduling, but how much benefit ARM can take from them?
- Swing Modulo Scheduling: Much speed lost (-2% on EEMBC) when -fmodulo-sched is turned on (FSF GCC r165607). Why ? Easy to improve it for ARM?
- Selective scheduling:
* Some speed gain is got with -fselective-
* -fsel-sched-
- sched-pressure:
* "-funroll-loops -fsched-pressure" (+11.3%) is better than "-funroll-loops" (+11.2). Performance gain is not that much. Can we improve "-fsched-pressure" further on ARM?
2. Avoid speed regression
- Continuous speed evaluation: it might be impractical to measure speed on each commit, but we may check performance number in a weekly manner. Can Linaro infrastructure team help on this?
- Avoid big changes in one commit: small-changes-
- Merge from upstreams or apply some patches from somewhere else, what should we do if we find a speed regression? For example, after merging from FSF 4.5.1, Linaro GCC 4.5 has some speed regressions on some EEMBC cases, here are some options for us to handle it,
* Assign someone to look into this speed regression,
* Open a ticket for this speed regression to track it,