PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan, Dongsheng Li, Yongle Chen, Jiye Liang, Wenjian Wang, Xicheng, Lu

TL;DR
PipeOptim introduces an optimizer-dependent weight prediction method for asynchronous 1F1B pipeline training, effectively addressing weight inconsistency and staleness, thereby improving throughput and learning quality across various models and tasks.
Contribution
This paper proposes PipeOptim, a novel weight prediction strategy that ensures consistent weights during forward passes in 1F1B pipeline training, independent of optimizer type.
Findings
Outperforms GPipe, PipeDream, and other pipelined approaches.
Maintains high throughput with effective weight consistency.
Valid across multiple models and tasks.
Abstract
Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the "1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the "1F1B"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
MethodsGPipe · PipeDream · PipeDream-2BW
