StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Kai Wang, Xuanlei Zhao, James Demmel, Yang You

TL;DR
StarTrail introduces a multi-dimensional parallelism approach for efficient training of long-sequence Transformer models, significantly reducing communication overhead and improving performance across NLP and CV tasks.
Contribution
It proposes a novel concentric ring parallelism method with an extra dimension and sub-ring communication to enhance scalability and efficiency in long-sequence Transformer training.
Findings
Achieves up to 77.12% performance improvement on GPT models
Achieves up to 114.33% performance improvement on DiT models
Reduces communication volume and bandwidth bottlenecks significantly
Abstract
Training Transformer models on long sequences in a distributed setting poses significant challenges in terms of efficiency and scalability. Current methods are either constrained by the number of attention heads or excessive communication overheads. To address this problem, we propose StarTrail, a multi-dimensional concentric distributed training system for long sequences, fostering an efficient communication paradigm and providing additional tuning flexibility for communication arrangements. Specifically, StarTrail introduces an extra parallel dimension and divides the peer-to-peer communication into sub-rings to substantially reduce communication volume and avoid bandwidth bottlenecks. Through comprehensive experiments across diverse hardware environments and on both Natural Language Processing (NLP) and Computer Vision (CV) tasks, we demonstrate that our approach significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMagnetic confinement fusion research · Computational Physics and Python Applications · Gamma-ray bursts and supernovae
