Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures
Mehmet Deveci, Christian Trott, Sivasankaran Rajamanickam

TL;DR
This paper presents parallel algorithms for sparse matrix-matrix multiplication optimized for many-core and GPU architectures, emphasizing performance portability and data structure choices, with a meta-algorithm for adaptive selection.
Contribution
It introduces a meta-algorithm, kkSpGEMM, that adaptively selects algorithms and data structures based on problem characteristics for improved performance.
Findings
Performance varies with data structures used.
kkSpGEMM effectively chooses optimal algorithms.
Two-phase implementations are recommended for efficiency.
Abstract
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
