The ARM Scalable Vector Extension
Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou, Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro, Martinez, Nathanael Premillieu, Alastair Reid, Alejandro Rico, Paul Walker

TL;DR
The ARM SVE architecture extends vector processing capabilities with scalable vector lengths, supporting diverse applications and enabling efficient auto-vectorization without software rework.
Contribution
It introduces a scalable, vector-length agnostic architecture that enhances auto-vectorization and supports multiple implementations, addressing key challenges in high-performance computing.
Findings
Supports vector lengths from 128 to 2048 bits
Enables code to scale automatically across vector lengths
Introduces features to improve auto-vectorization
Abstract
This article describes the ARM Scalable Vector Extension (SVE). Several goals guided the design of the architecture. First was the need to extend the vector processing capability associated with the ARM AArch64 execution state to better address the computational requirements in domains such as high-performance computing, data analytics, computer vision, and machine learning. Second was the desire to introduce an extension that can scale across multiple implementations, both now and into the future, allowing CPU designers to choose the vector length most suitable for their power, performance, and area targets. Finally, the architecture should avoid imposing a software development cost as the vector length changes and where possible reduce it by improving the reach of compiler auto-vectorization technologies. SVE achieves these goals. It allows implementations to choose a vector register…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
