DISTAL: The Distributed Tensor Algebra Compiler
Rohan Yadav, Alex Aiken, Fredrik Kjolstad

TL;DR
DISTAL is a flexible compiler for dense tensor algebra that enables efficient distributed computation on modern heterogeneous systems, outperforming existing solutions in tensor operations.
Contribution
It introduces a novel compiler framework allowing independent specification of data and computation distribution for tensor algebra on distributed heterogeneous systems.
Findings
Achieves competitive performance on matrix multiplication at large scale
Outperforms existing tensor computation systems by up to 3.7x
Supports a wide design space including classical and modern algorithms
Abstract
We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors and computation map onto target machines through separate format and scheduling languages. The combination of choices for data and computation distribution creates a large design space that includes many algorithms from both the past (e.g., Cannon's algorithm) and the present (e.g., COSMA). DISTAL compiles a tensor algebra domain specific language to a distributed task-based runtime system and supports nodes with multi-core CPUs and multiple GPUs. Code generated by DISTAL is competitive with optimized codes for matrix multiply on 256 nodes of the Lassen supercomputer and outperforms existing systems by between 1.8x to 3.7x (with a 45.7x outlier) on higher order tensor operations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
