Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators
Paolo Sylos Labini, Massimo Bernaschi, Francesco Silvestri, Flavio, Vella

TL;DR
This paper demonstrates that tensor accelerators, traditionally used for dense tensor operations, can be effectively adapted for sparse matrix multiplication, achieving significant speed-ups with a novel blocking algorithm.
Contribution
The authors introduce a 1D blocking algorithm with theoretical guarantees that enables efficient sparse matrix multiplication on tensor accelerators, challenging the assumption that they are unsuitable for sparse data.
Findings
Achieved up to two orders of magnitude speed-up on real-world sparse matrices.
Developed a dense blocking method that exploits Nvidia Tensor Cores for sparse matrices.
Proved theoretical guarantees on the density of dense blocks from sparse matrices.
Abstract
Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Computational Physics and Python Applications
