AVX / NEON Intrinsic Functions: When Should They Be Used?
Th\'eo Boivin (CERFACS), Joeffrey Legaux (CERFACS)

TL;DR
This paper evaluates when to use AVX/NEON intrinsic functions for code optimization, providing guidance based on architecture, OS, and compiler, highlighting their efficiency in branching but noting many cases where auto-vectorization suffices.
Contribution
It introduces a benchmark to assess intrinsic function effectiveness across different configurations, aiding developers in making informed optimization decisions.
Findings
Intrinsic functions are highly efficient in conditional branching.
Auto-vectorization by compilers often makes intrinsic functions unnecessary.
Performance gains vary depending on architecture and compiler.
Abstract
A cross-configuration benchmark is proposed to explore the capacities and limitations of AVX / NEON intrinsic functions in a generic context of development project, when a vectorisation strategy is required to optimise the code. The main aim is to guide developers to choose when using intrinsic functions, depending on the OS, architecture and/or available compiler. Intrinsic functions were observed highly efficient in conditional branching, with intrinsic version execution time reaching around 5% of plain code execution time. However, intrinsic functions were observed as unnecessary in many cases, as the compilers already well auto-vectorise the code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Logic, programming, and type systems
