Pipeline Parallelism with Controllable Memory

Penghui Qi; Xinyi Wan; Nyamdavaa Amar; Min Lin

arXiv:2405.15362·cs.LG·November 5, 2024

Pipeline Parallelism with Controllable Memory

Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a systematic framework for pipeline parallelism that reduces peak activation memory and improves throughput by designing memory-efficient building blocks, outperforming existing methods in various settings.

Contribution

It proposes a novel framework for decomposing pipeline schedules and introduces memory-efficient building blocks with controllable activation memory, enhancing performance.

Findings

01

Reduces peak activation memory to 1/2 or 1/3 of 1F1B without losing efficiency.

02

Achieves 7% to 55% throughput improvement in pure pipeline parallelism.

03

Demonstrates 16% throughput gain over 1F1B baseline in large language models.

Abstract

Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block, and show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/zero-bubble-pipeline-parallelism
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower System Optimization and Stability