Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large   Language Model Training

Ao Sun; Weilin Zhao; Xu Han; Cheng Yang; Xinrong Zhang; Zhiyuan Liu,; Chuan Shi; Maosong Sun

arXiv:2406.03488·cs.DC·November 12, 2024·1 cites

Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu,, Chuan Shi, Maosong Sun

PDF

Open Access 1 Repo

TL;DR

Seq1F1B introduces a sequence-level pipeline parallelism method that significantly improves training efficiency and memory usage for large language models on long sequences, enabling training of 30B parameter models on sequences up to 64k without recomputation.

Contribution

The paper proposes Seq1F1B, a novel sequence-level pipeline scheduling method that reduces memory footprint and pipeline bubbles, enabling efficient training of large models on very long sequences.

Findings

01

Achieves higher throughput than baseline methods.

02

Reduces memory footprint during training.

03

Successfully trains a 30B parameter model on 64k sequences without recomputation.

Abstract

The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role. As LLMs' training sequence length extends to 32k or even 128k, the current pipeline parallel methods face severe bottlenecks, including high memory footprints and substantial pipeline bubbles, greatly hindering model scalability and training throughput. To enhance memory efficiency and training throughput, in this work, we introduce an efficient sequence-level one-forward-one-backward (1F1B) pipeline scheduling method tailored for training LLMs on long sequences named Seq1F1B. Seq1F1B decomposes batch-level schedulable units into finer sequence-level units, reducing bubble size and memory footprint. Considering that Seq1F1B may produce slight extra bubbles if sequences are split evenly, we design a computation-wise strategy to partition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maydomine/seq1f1b
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis