DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

Xuan Peng; Xuanhua Shi; Haolin Zhang; Yunfei Zhao; Xuehai Qian

arXiv:2505.05856·cs.DC·May 12, 2025

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

Xuan Peng, Xuanhua Shi, Haolin Zhang, Yunfei Zhao, Xuehai Qian

PDF

Open Access

TL;DR

DawnPiper is a novel pipeline parallel training framework that optimizes memory usage and model partitioning, enabling larger models and faster training speeds for large-scale neural networks.

Contribution

It introduces a DL compilation-based profiling method and a performance-optimal theorem for memory-efficient pipeline partitioning, improving scalability and efficiency.

Findings

01

Up to 4x increase in trainable batch size over vPipe

02

Up to 11x increase over PipeDream

03

1.5x performance speedup compared to vPipe

Abstract

Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively support. In this paper, we introduce DawnPiper, a memory-scalable pipeline parallel training framework. Firstly, we develop a DL compilation-based profiling method that transforms the model into a fine-grained computation graph. This refinement gives us a finer granularity of model partitioning and memory optimization while facilitating automatic code generation. Based on observed memory usage characteristics, we derive a performance-optimal theorem for pipeline parallel partitioning that substantially reduces the partition search space. Secondly, we propose a binary pipeline partitioning algorithm and utilize a cost-model based memory optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Graph Theory and Algorithms · Advanced Neural Network Applications