High Performance Unstructured SpMM Computation Using Tensor Cores

Patrik Okanovic; Grzegorz Kwasniewski; Paolo Sylos Labini; Maciej; Besta; Flavio Vella; Torsten Hoefler

arXiv:2408.11551·cs.DC·August 22, 2024

High Performance Unstructured SpMM Computation Using Tensor Cores

Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej, Besta, Flavio Vella, Torsten Hoefler

PDF

Open Access 1 Repo

TL;DR

This paper introduces SMaT, a GPU-accelerated library that leverages Tensor Cores for high-performance unstructured sparse matrix-matrix multiplication, significantly outperforming existing libraries.

Contribution

The paper presents SMaT, a novel SpMM library that effectively utilizes Tensor Cores for unstructured sparsity, with algorithmic optimizations and CUDA MMA API integration.

Findings

01

SMaT outperforms state-of-the-art libraries by up to 125x.

02

Achieves an average speedup of 2.6x over existing solutions.

03

Demonstrates applicability to scientific computing and large-model workloads.

Abstract

High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spcl/smat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Learning Systems