PipeDream: Fast and Efficient Pipeline Parallel DNN Training
Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri,, Nikhil Devanur, Greg Ganger, Phil Gibbons

TL;DR
PipeDream introduces a pipeline parallel training system for DNNs on GPUs that significantly reduces communication overhead and improves training speed by efficiently overlapping computation and communication.
Contribution
It presents a novel pipeline parallel training system that reduces communication, balances GPU workload, and accelerates DNN training compared to traditional data-parallel methods.
Findings
Up to 95% reduction in communication for large DNNs.
Up to 5x faster time-to-accuracy in experiments.
Effective scheduling and partitioning improve GPU utilization.
Abstract
PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large models and/or limited network bandwidth induce high communication-to-computation ratios. PipeDream reduces communication by up to 95% for large DNNs relative to data-parallel training, and allows perfect overlap of communication and computation. PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication, versions model parameters for backward pass correctness, and schedules the forward and backward passes of different inputs in round-robin fashion to optimize "time to target accuracy". Experiments with five different DNNs on two different clusters show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
