PipeDream: Fast and Efficient Pipeline Parallel DNN Training

Aaron Harlap; Deepak Narayanan; Amar Phanishayee; Vivek Seshadri,; Nikhil Devanur; Greg Ganger; Phil Gibbons

arXiv:1806.03377·cs.DC·June 12, 2018·97 cites

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri,, Nikhil Devanur, Greg Ganger, Phil Gibbons

PDF

Open Access 1 Repo

TL;DR

PipeDream introduces a pipeline parallel training system for DNNs on GPUs that significantly reduces communication overhead and improves training speed by efficiently overlapping computation and communication.

Contribution

It presents a novel pipeline parallel training system that reduces communication, balances GPU workload, and accelerates DNN training compared to traditional data-parallel methods.

Findings

01

Up to 95% reduction in communication for large DNNs.

02

Up to 5x faster time-to-accuracy in experiments.

03

Effective scheduling and partitioning improve GPU utilization.

Abstract

PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large models and/or limited network bandwidth induce high communication-to-computation ratios. PipeDream reduces communication by up to 95% for large DNNs relative to data-parallel training, and allows perfect overlap of communication and computation. PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication, versions model parameters for backward pass correctness, and schedules the forward and backward passes of different inputs in round-robin fashion to optimize "time to target accuracy". Experiments with five different DNNs on two different clusters show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RithvikChan/PipeGAN
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques