Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format
Phillip Allen Lane, Joshua Dennis Booth

TL;DR
This paper introduces CSR-k, a heterogeneous sparse matrix format based on CSR, that enables efficient, portable SpMV performance across diverse HPC hardware by using a hierarchical row grouping and simple tuning.
Contribution
The paper proposes CSR-k, a novel, easy-to-tune heterogeneous sparse matrix format that improves performance portability for SpMV on various CPU and GPU architectures.
Findings
CSR-k outperforms Intel MKL on CPUs.
CSR-k surpasses cuSPARSE and KokkosKernels on GPUs.
Efficient tuning with constant-time model for device-specific parameters.
Abstract
Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 8380 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
