OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs
Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Linyong Huang,, Hongzhong Zheng, Yuan Xie

TL;DR
OpSparse is a highly optimized GPU library for sparse matrix multiplication that significantly outperforms existing libraries by applying low-level architecture-specific optimizations.
Contribution
The paper introduces OpSparse, a GPU-optimized SpGEMM library that incorporates low-level optimizations neglected by prior high-level algorithm-focused libraries.
Findings
OpSparse achieves up to 27.8x speedup over cuSPARSE.
OpSparse outperforms nsparse and spECK by 1.81x and 2.04x respectively.
The optimizations improve load balancing, memory utilization, and parallelism.
Abstract
Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing high-performance SpGEMM implementation on modern processors such as GPUs is challenging. The state-of-the-art SpGEMM libraries (i.e., and ) adopt several algorithms to tackle the challenges of global load balance, local load balance, and allocation of the result matrix. While these libraries focus on the high-level algorithm design for SpGEMM, they neglect several low-level architecture-specific optimizations, which causes inefficient implementations in their libraries. In this paper, we classify their inefficient implementations into seven categories. Based on our observations, we propose a highly optimized SpGEMM library called . The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Distributed and Parallel Computing Systems
