SPC5: an efficient SpMV framework vectorized using ARM SVE and x86   AVX-512

Evann Regnault; Berenger Bramas

arXiv:2307.14774·cs.DC·July 28, 2023

SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512

Evann Regnault, Berenger Bramas

PDF

TL;DR

This paper presents SPC5, an efficient sparse matrix/vector multiplication framework optimized for ARM SVE and x86 AVX-512 architectures, demonstrating significant performance improvements on modern supercomputers.

Contribution

The paper introduces a porting of the SPC5 SpMV framework to ARM SVE, adapting AVX-512 kernels for ARM, and compares performance across architectures.

Findings

01

SVE kernels outperform standard CSR on ARM hardware

02

AVX-512 kernels show high efficiency on x86 systems

03

Porting techniques enable cross-architecture performance gains

Abstract

The sparse matrix/vector product (SpMV) is a fundamental operation in scientific computing. Having access to an efficient SpMV implementation is therefore critical, if not mandatory, to solve challenging numerical problems. The ARM-based AFX64 CPU is a modern hardware component that equips one of the fastest supercomputers in the world. This CPU supports the Scalable Vector Extension (SVE) vectorization technology, which has been less investigated than the classic x86 instruction set architectures. In this paper, we describe how we ported the SPC5 SpMV framework on AFX64 by converting AVX512 kernels to SVE. In addition, we present performance results by comparing our kernels against a standard CSR kernel for both Intel-AVX512 and Fujitsu-ARM-SVE architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.