Pipeline Parallelism with Controllable Memory
Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

TL;DR
This paper introduces a systematic framework for pipeline parallelism that reduces peak activation memory and improves throughput by designing memory-efficient building blocks, outperforming existing methods in various settings.
Contribution
It proposes a novel framework for decomposing pipeline schedules and introduces memory-efficient building blocks with controllable activation memory, enhancing performance.
Findings
Reduces peak activation memory to 1/2 or 1/3 of 1F1B without losing efficiency.
Achieves 7% to 55% throughput improvement in pure pipeline parallelism.
Demonstrates 16% throughput gain over 1F1B baseline in large language models.
Abstract
Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block, and show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower System Optimization and Stability
