Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
Ziming Liu, Shenggan Cheng, Haotian Zhou, Yang You

TL;DR
Hanayo introduces a wave-like pipeline parallelism method that improves large model training efficiency by reducing pipeline bubbles and memory use, achieving up to 30.4% higher throughput.
Contribution
The paper presents Hanayo, a novel wave-like pipeline parallelism strategy with a high-performance runtime that outperforms existing methods without model duplication.
Findings
Up to 30.4% throughput increase over state-of-the-art methods.
Effective mitigation of pipeline bubbles and memory issues.
Validated on four clusters with GPT-like and BERT-like models.
Abstract
Large-scale language models have become increasingly challenging and expensive to train. Among various methods addressing this issue, Pipeline Parallelism has been widely employed to accommodate massive model weights within limited GPU memory. This paper introduces Hanayo, a wave-like pipeline parallelism strategy that boasts a concise structure and practical applicability, alongside a high-performance pipeline execution runtime to tackle the challenges of pipeline strategy implementation. Hanayo mitigates the issues of pipeline bubbles and excessive memory consumption prevalent in existing schemes, without resorting to model duplicates as in Chimera. Our evaluation, conducted on four distinct computing clusters and involving both GPT-like and BERT-like architectures with up to 32 GPUs, demonstrates up to a 30.4 \% increase in throughput compared to the state-of-the-art approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
