OpSparse: a Highly Optimized Framework for Sparse General Matrix   Multiplication on GPUs

Zhaoyang Du; Yijin Guan; Tianchan Guan; Dimin Niu; Linyong Huang,; Hongzhong Zheng; Yuan Xie

arXiv:2206.07244·cs.DC·June 16, 2022

OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Linyong Huang,, Hongzhong Zheng, Yuan Xie

PDF

Open Access 1 Repo

TL;DR

OpSparse is a highly optimized GPU library for sparse matrix multiplication that significantly outperforms existing libraries by applying low-level architecture-specific optimizations.

Contribution

The paper introduces OpSparse, a GPU-optimized SpGEMM library that incorporates low-level optimizations neglected by prior high-level algorithm-focused libraries.

Findings

01

OpSparse achieves up to 27.8x speedup over cuSPARSE.

02

OpSparse outperforms nsparse and spECK by 1.81x and 2.04x respectively.

03

The optimizations improve load balancing, memory utilization, and parallelism.

Abstract

Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing high-performance SpGEMM implementation on modern processors such as GPUs is challenging. The state-of-the-art SpGEMM libraries (i.e., $n s p a r se$ and $s pE C K$ ) adopt several algorithms to tackle the challenges of global load balance, local load balance, and allocation of the result matrix. While these libraries focus on the high-level algorithm design for SpGEMM, they neglect several low-level architecture-specific optimizations, which causes inefficient implementations in their libraries. In this paper, we classify their inefficient implementations into seven categories. Based on our observations, we propose a highly optimized SpGEMM library called $O pS p a r se$ . The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lorentzbf/OpSparse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Distributed and Parallel Computing Systems