VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor   Cores

Roberto L. Castro; Andrei Ivanov; Diego Andrade; Tal Ben-Nun; Basilio; B. Fraguela; Torsten Hoefler

arXiv:2310.02065·cs.DC·October 4, 2023

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio, B. Fraguela, Torsten Hoefler

PDF

Open Access 1 Repo

TL;DR

This paper introduces the V:N:M sparse tensor format and Spatha library, enabling higher sparsity ratios and significant speedups on NVIDIA's Sparse Tensor Cores for deep learning models.

Contribution

The paper proposes the V:N:M format for arbitrary N:M sparsity on SPTCs and introduces Spatha, a high-performance library for efficient sparse tensor operations.

Findings

01

Spatha achieves up to 37x speedup over cuBLAS.

02

V:N:M format supports arbitrary N:M ratios on SPTCs.

03

High sparsity ratios with minimal accuracy loss in transformers.

Abstract

The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2x speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparse-library for DL routines. We show that Spatha achieves up to 37x speedup over cuBLAS. We also demonstrate a second-order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

udc-gac/venom
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Data Storage Technologies

MethodsPruning