DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng,, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu, Wei, Lin

TL;DR
DAPPLE is a novel training framework that combines data and pipeline parallelism with an optimized scheduler, significantly improving efficiency and reducing memory usage for large DNN models on GPU clusters.
Contribution
It introduces a new parallelization strategy planner and runtime scheduler that enhance training efficiency and memory management for large models.
Findings
Outperforms PipeDream's planner by up to 3.23x in strategy effectiveness.
Achieves 1.6x training throughput speedup over GPipe.
Reduces memory consumption by 12% without sacrificing throughput.
Abstract
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategy of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Ferroelectric and Negative Capacitance Devices
