Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
Yifan Li, Giulia Guidi

TL;DR
This paper introduces an estimation-based approach for sparse matrix multiplication on GPUs, replacing costly symbolic computation with HyperLogLog estimators, resulting in significant speedups.
Contribution
It proposes a novel estimation workflow and hybrid accumulator design that outperform existing GPU SpGEMM solutions by reducing symbolic computation overhead.
Findings
Achieves 1.4x-2.8x speedup on NVIDIA A100 and H100 GPUs.
Replaces symbolic pass with HyperLogLog estimators for efficiency.
Outperforms leading GPU SpGEMM implementations across various matrices.
Abstract
In computational science and data analytics, many workloads involve irregular and sparse computations that are inherently difficult to optimize for modern hardware. A key kernel is Sparse General Matrix-Matrix Multiplication (SpGEMM), which underpins simulations, graph analytics, and machine learning applications. SpGEMM exhibits irregular memory access patterns and workload imbalance, making it challenging to achieve high performance on GPUs. Current GPU SpGEMM solutions typically rely on a two-pass workflow to address load imbalance and reduce memory access. The symbolic pass, which determines the number of output elements per row, accounts for roughly 28% of the total runtime on average. In this work, we question the necessity of exact symbolic computation and introduce an estimation-based SpGEMM workflow. Our approach replaces the costly symbolic step with lightweight HyperLogLog…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
