TL;DR
This paper presents a new framework for efficient sparse matrix-matrix multiplication on GPUs and heterogeneous processors, addressing irregularity, memory management, and load balancing to improve performance.
Contribution
It introduces a hybrid memory pre-allocation, a GPU merge path algorithm for insertions, and load balancing based on arithmetic operations, advancing the state-of-the-art in SpGEMM implementations.
Findings
Achieves higher throughput on heterogeneous processors.
Outperforms existing CPU and GPU SpGEMM methods.
Demonstrates significant speedups on diverse benchmarks.
Abstract
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
