GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen,, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng, Chen

TL;DR
GPipe introduces a pipeline parallelism method enabling efficient training of extremely large neural networks across multiple accelerators, improving scalability and performance for diverse tasks.
Contribution
The paper presents GPipe, a flexible pipeline parallelism library with a novel batch-splitting algorithm that achieves near-linear speedup for large-scale neural network training.
Findings
Trained a 557-million-parameter model with 84.4% top-1 accuracy on ImageNet.
Successfully trained a 6-billion-parameter multilingual Transformer surpassing bilingual models.
Demonstrated efficient scaling and training of large models across different architectures.
Abstract
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · GPipe · Spatially Separable Convolution · Max Pooling · Convolution · Average Pooling · AmoebaNet · Residual Connection
