Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU

Yifan Li; Giulia Guidi

arXiv:2604.19004·cs.DC·April 22, 2026

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU

Yifan Li, Giulia Guidi

PDF

TL;DR

This paper introduces an estimation-based approach for sparse matrix multiplication on GPUs, replacing costly symbolic computation with HyperLogLog estimators, resulting in significant speedups.

Contribution

It proposes a novel estimation workflow and hybrid accumulator design that outperform existing GPU SpGEMM solutions by reducing symbolic computation overhead.

Findings

01

Achieves 1.4x-2.8x speedup on NVIDIA A100 and H100 GPUs.

02

Replaces symbolic pass with HyperLogLog estimators for efficiency.

03

Outperforms leading GPU SpGEMM implementations across various matrices.

Abstract

In computational science and data analytics, many workloads involve irregular and sparse computations that are inherently difficult to optimize for modern hardware. A key kernel is Sparse General Matrix-Matrix Multiplication (SpGEMM), which underpins simulations, graph analytics, and machine learning applications. SpGEMM exhibits irregular memory access patterns and workload imbalance, making it challenging to achieve high performance on GPUs. Current GPU SpGEMM solutions typically rely on a two-pass workflow to address load imbalance and reduce memory access. The symbolic pass, which determines the number of output elements per row, accounts for roughly 28% of the total runtime on average. In this work, we question the necessity of exact symbolic computation and introduce an estimation-based SpGEMM workflow. Our approach replaces the costly symbolic step with lightweight HyperLogLog…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.