DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution
Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, Tim Harris, Matei, Zaharia

TL;DR
DistIR introduces an expressive intermediate representation and simulator for distributed neural network computation, enabling rapid exploration of complex distribution strategies and significantly reducing optimization time without physical hardware execution.
Contribution
The paper presents DistIR, a novel intermediate representation that can naturally express diverse distribution strategies and facilitate efficient simulation and optimization of distributed DNN training.
Findings
Enables fast grid search over 1000+ configurations
Reduces optimization time by an order of magnitude
Supports various distribution strategies including pipeline parallelism
Abstract
The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam
