Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
Athena Elafrou, Georgios Goumas, Nektarios Koziris

TL;DR
This paper introduces a runtime-adaptive optimizer for sparse matrix-vector multiplication that identifies bottlenecks and applies suitable optimizations, significantly improving performance across diverse processors and matrices.
Contribution
It presents a low-overhead, matrix- and architecture-adaptive SpMV optimizer that dynamically selects optimizations based on performance bottlenecks, enhancing efficiency on modern multi-core processors.
Findings
Achieves significant speedups over Intel MKL SpMV kernels.
Effectively distinguishes and optimizes for diverse matrices and architectures.
Demonstrates practical low-overhead optimization setup.
Abstract
This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is both matrix- and architecture-adaptive through runtime specialization. To this direction, we present an approach that first identifies the performance bottlenecks of SpMV for a given sparse matrix on the target platform either through profiling or by matrix property inspection, and then selects suitable optimizations to tackle those bottlenecks. Our optimization pool is based on the widely used Compressed Sparse Row (CSR) sparse matrix storage format and has low preprocessing overheads, making our overall approach practical even in cases where fast decision making and optimization setup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
