Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
Berenger Bramas, Pavel Kus

TL;DR
This paper introduces AVX-512 optimized, zero-padding-free block-based kernels for sparse matrix-vector multiplication, improving performance on modern CPUs without the typical padding overhead.
Contribution
It presents new mask-based sparse matrix formats and kernels that eliminate zero padding, along with a method to select optimal kernel sizes based on matrix characteristics.
Findings
Significant performance improvements over Intel MKL CSR kernel
Effective kernel prediction method for different matrices
Open source implementation in SPC5 library
Abstract
The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
