Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large   Model Training Efficiency

Ziming Liu; Shenggan Cheng; Haotian Zhou; Yang You

arXiv:2308.15762·cs.DC·August 31, 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Ziming Liu, Shenggan Cheng, Haotian Zhou, Yang You

PDF

TL;DR

Hanayo introduces a wave-like pipeline parallelism method that improves large model training efficiency by reducing pipeline bubbles and memory use, achieving up to 30.4% higher throughput.

Contribution

The paper presents Hanayo, a novel wave-like pipeline parallelism strategy with a high-performance runtime that outperforms existing methods without model duplication.

Findings

01

Up to 30.4% throughput increase over state-of-the-art methods.

02

Effective mitigation of pipeline bubbles and memory issues.

03

Validated on four clusters with GPT-like and BERT-like models.

Abstract

Large-scale language models have become increasingly challenging and expensive to train. Among various methods addressing this issue, Pipeline Parallelism has been widely employed to accommodate massive model weights within limited GPU memory. This paper introduces Hanayo, a wave-like pipeline parallelism strategy that boasts a concise structure and practical applicability, alongside a high-performance pipeline execution runtime to tackle the challenges of pipeline strategy implementation. Hanayo mitigates the issues of pipeline bubbles and excessive memory consumption prevalent in existing schemes, without resorting to model duplicates as in Chimera. Our evaluation, conducted on four distinct computing clusters and involving both GPT-like and BERT-like architectures with up to 32 GPUs, demonstrates up to a 30.4 \% increase in throughput compared to the state-of-the-art approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.