SparseTIR: Composable Abstractions for Sparse Compilation in Deep   Learning

Zihao Ye; Ruihang Lai; Junru Shao; Tianqi Chen; Luis Ceze

arXiv:2207.04606·cs.LG·February 22, 2023·5 cites

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

PDF

Open Access 2 Repos

TL;DR

SparseTIR introduces a composable abstraction for sparse tensor compilation in deep learning, enabling flexible formats and transformations that significantly improve performance across various operators and end-to-end workloads.

Contribution

It proposes SparseTIR, a novel sparse tensor compilation framework with composable formats and transformations, addressing limitations of single-format and single-shot compilers.

Findings

01

Achieves 1.20-2.34x speedup on GPU for GNN operators

02

Attains 1.05-2.98x speedup for sparse attention operators

03

Provides 4.20-40.18x acceleration for RGCN inference

Abstract

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. In this paper, we observe that the key to addressing both these challenges is to leverage composable formats and composable transformations. We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications

MethodsAttention Is All You Need · Convolution · Linear Layer · Attention Dropout · Layer Normalization · Adam · Cosine Annealing · Linear Warmup With Cosine Annealing · Weight Decay · Softmax