Get rid of libm dependency
Get rid of depending on a libm on the device. This dependency is tolerable for hosts like desktop Linux or Mac but intolerable on embedded systems which might not have a linkable libm installed. In addition, providing implementations of all required functions is good for aggressive inlining that is often desired from OpenCL C programs.
Blueprint information
- Status:
- Not started
- Approver:
- Pekka Jääskeläinen
- Priority:
- High
- Drafter:
- Pekka Jääskeläinen
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Discussion
- Series goal:
- Accepted for 1.0
- Implementation:
- Unknown
- Milestone target:
- 1.0
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
The math lib calls are currently handled with LLVM builtins that are expected to be lowered either to target device instructions directly or to library calls. The problem with the latter is that the system libraries can be linked dynamically in which case inlining won't work.
I think it's better that the 'generic' kernel library does not use any intrinsics or builtins but calls the pocl versions of the libm functions to produce a self-contained kernel binary that can be aggressively inlined. The device-specific kernel libraries can override the functions as they wish, for example, call the builtins or even inline assembly in case it optimizes the speed for the target.
The math library functions we need are mainly the trigonometric functions and max etc. We could implement them fully from the scratch but better reuse code from some liberally licensed math library implementation.
For example:
http://
csanchezdll:
Having the math functions in bytecode would be good as it will allow inlining, creating a self-contained executable kernel. The drawback on this is that it will not use target-specific features (some targets might have a transcendental function unit that implements trigonometric calculations, for example). Of course, the "default" does not need such optimizations.
On the other hand the built-ins are "kind-of" standard... and if the default target is only going to be used by native "run-on-host" devices (and this only when there is no target-specific library) then the built-ins can be fine as well, cause in native you have compiled pocl itself so you can expect some C library support. Let us discuss pros and cons of each approach.
Regarding the "implementation" itself, I find implementing the functions itself a bit "overkill". They are probably ready-made libraries, the simplest I can think of being newlib. It can probably compile out of the box with clang generating a bytecode library, so I would rather *use* newlib... implementing the trigs is just adding another source of errors to pocl.
This means of course pocl depends on newlib... no big deal, and even more, seems to suggest the idea of no making the default library use it but only the target-dependant libraries that require it... so if target X can not use built-ins, it uses newlib, and during configure if no newlib present... well, target X does not get built, but the while pocl would depend on anything new.
Pekka:
Requiring Newlib to be in device/host produces more trouble. A point in embedding the required functions inside pocl is to make pocl self-contained (aside from the LLVM/Clang dependency), that is, to make it easily portable to various platforms (hosts and devices) + the inlining benefits. I just got rid of the gcc dependency in the 'ld' branch and I'd like to get rid of the libm dependency (were it the Newlib or the native) too.
Newlib is quite big and contains the whole C library which pocl does not need. It would require porting the whole newlib to the target under question whenever one wants to use pocl on a host/device. I see that quite a bit more "overkill" than just copying the functions we need from some BSD/MIT licensed library, if one is found. Anyways, it seems the math lib of Newlib is nicely separable so we could include it in pocl to avoid the requirement of Newlib installed? The license is a bit unclear but I think it's a BSD license for the libm part. We could just copy the 'math' dir from the Newlib to the source tree and then the various kernel lib implementations can cherry pick the codes they need from there (at source code level due to the different bitcode targets).
For the question what should be the default for the "generic" library...
A major point IMHO for the "inlineable versions by default" is the exploitation inter-WI parallelism with vectors or long instructions which is ruined if you have a libary call in the kernel. Avoiding such can lead to a more parallelizable default generic lib (ability to maybe execute some parts of sin/cos, for example, for multiple WIs using parallel instructions) which should be a good in the "performance portability" sense.
So. I propose:
1. Copy the required math implementations from Newlib
2 .Use them in the generic implementation and assume the device-optimized libs use whatever is better for them
I can implement this if you agree. I would like to hear Erik's comments too.