DistIR: An Intermediate Representation and Simulator for Efficient   Neural Network Distribution

Keshav Santhanam; Siddharth Krishna; Ryota Tomioka; Tim Harris; Matei; Zaharia

arXiv:2111.05426·cs.LG·November 11, 2021·1 cites

DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, Tim Harris, Matei, Zaharia

PDF

Open Access

TL;DR

DistIR introduces an expressive intermediate representation and simulator for distributed neural network computation, enabling rapid exploration of complex distribution strategies and significantly reducing optimization time without physical hardware execution.

Contribution

The paper presents DistIR, a novel intermediate representation that can naturally express diverse distribution strategies and facilitate efficient simulation and optimization of distributed DNN training.

Findings

01

Enables fast grid search over 1000+ configurations

02

Reduces optimization time by an order of magnitude

03

Supports various distribution strategies including pipeline parallelism

Abstract

The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam