Optimize selected key glibc functions for multi core architectures
Those optimizations include advanced open source modifications to memcpy, strcmp and malloc to fine tune memory/cache accesses and heap locking.
For instance, memcpy and strcmp can benefir of new cache organizations, h/w multi-threading and awareness of data layout. malloc performance can benefit from the knowledge on ability of the architecture to execute concurrently and fine tune synchronization stalls.
Whiteboard
Due to intrusiveness of proposed changes and interception of main run-time library routines, it is decided to analyze the benefit of strcmp optimizations first. The ld.so should also benefit of these optimizations.
The shared cache architecture will allow to leverage optimized strncmp on hashing algorithm for linking can when the ld is aware of shared symbols in the cache.
Potentially strncmp can be checked in the Edgy if passed testing and review. memcpy would follow.
Proposed code would :
- take care of alignment/length of the string
- prefetch into cache if reused or threaded
- use correct optimized compiler flags and intrinsics
- SSE/SSE2 usage
- cache and cacheline size
- reduce mispredictions
Analyze 64-bit code; intrinsics