Implement constant vec permute operation for the vext instruction
The vext instruction can be supported with the constant vector permute infrastructure that GCC has in FSF 4.7 and above. While this infrastructure supports all the other constant vector permute operations it doesn't support this one.
Blueprint information
- Status:
- Complete
- Approver:
- Michael Hope
- Priority:
- Medium
- Drafter:
- Ramana Radhakrishnan
- Direction:
- Approved
- Assignee:
- Christophe Lyon
- Definition:
- Approved
- Series goal:
- Accepted for 4.7
- Implementation:
- Implemented
- Milestone target:
- None
- Started by
- Michael Hope
- Completed by
- Christophe Lyon
Related branches
Related bugs
Sprints
Whiteboard
Background
GCC has generic vector permute support. This can be used by the programmer in 2 ways. Using the language extension that is __builtin_shuffle or expecting the auto-vectorizer to generate vector permutes.
If the optimizers can detect a vector permute that can be done with a "constant" vector i.e. something like the Neon vrev{16/32/64} operations which can be expressed as per the testcase in gcc.target/
However the vtbl and vtbx instructions are expensive as they only operate on vector registers and need the mask for the generic shuffle to be loaded into it. For instance look at the example posted here to show the difference in code generated for the vrev cases http://
The aim of this task is to do the same for the permute operations that are allowed with the Neon vext instruction.
Thus at a broad level the tasks should be the following.
* Understand the Neon vext instruction and generate testcases using the generic __builtin_shuffle mechanism.
* Understand the backend implementation . Look at how the functions arm_evpc_neon_vuzp , arm_evpc_neon_vrev etc are used in arm.c