SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
Evann Regnault, Berenger Bramas

TL;DR
This paper presents SPC5, an efficient sparse matrix/vector multiplication framework optimized for ARM SVE and x86 AVX-512 architectures, demonstrating significant performance improvements on modern supercomputers.
Contribution
The paper introduces a porting of the SPC5 SpMV framework to ARM SVE, adapting AVX-512 kernels for ARM, and compares performance across architectures.
Findings
SVE kernels outperform standard CSR on ARM hardware
AVX-512 kernels show high efficiency on x86 systems
Porting techniques enable cross-architecture performance gains
Abstract
The sparse matrix/vector product (SpMV) is a fundamental operation in scientific computing. Having access to an efficient SpMV implementation is therefore critical, if not mandatory, to solve challenging numerical problems. The ARM-based AFX64 CPU is a modern hardware component that equips one of the fastest supercomputers in the world. This CPU supports the Scalable Vector Extension (SVE) vectorization technology, which has been less investigated than the classic x86 instruction set architectures. In this paper, we describe how we ported the SPC5 SpMV framework on AFX64 by converting AVX512 kernels to SVE. In addition, we present performance results by comparing our kernels against a standard CSR kernel for both Intel-AVX512 and Fujitsu-ARM-SVE architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
