MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
Jordi Wolfson-Pou, Jan Laukemann, Fabrizio Petrini

TL;DR
MAGNUS introduces a cache-aware, locality-optimizing algorithm for sparse matrix multiplication on CPUs, significantly improving performance over existing methods by reordering computations and using hybrid accumulators.
Contribution
It proposes a novel, system-aware reordering algorithm that enhances data locality and employs hybrid accumulators, outperforming current multithreaded SpGEMM implementations.
Findings
MAGNUS outperforms Intel MKL and other baselines on various matrices.
It scales efficiently to large matrices, maintaining high performance.
MAGNUS approaches the optimal computational bound for massive matrices.
Abstract
Sparse general matrix-matrix multiplication (SpGEMM) is a critical operation in many applications. Current multithreaded implementations are based on Gustavson's algorithm and often perform poorly on large matrices due to limited cache reuse by the accumulators. We present MAGNUS (Matrix Algebra for Gigantic NUmerical Systems), a novel algorithm to maximize data locality in SpGEMM. To generate locality, MAGNUS reorders the intermediate product into discrete cache-friendly chunks using a two-level hierarchical approach. The accumulator is applied to each chunk, where the chunk size is chosen such that the accumulator is cache-efficient. MAGNUS is input- and system-aware: based on the matrix characteristics and target system specifications, the optimal number of chunks is computed by minimizing the storage cost of the necessary data structures. MAGNUS allows for a hybrid accumulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
