USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang, Shangchun Zhao

TL;DR
This paper introduces a unified sequence parallelism approach for long-context generative AI models, improving robustness and efficiency in training large models with extended sequence lengths.
Contribution
It proposes a novel unified sequence parallelism method that outperforms existing approaches in robustness and efficiency, and provides best practices for hybrid parallelism design.
Findings
Achieved 47% MFU on LLAMA3-8B with 8xA800 nodes.
Compared communication and memory costs of SP with other parallelism methods.
Code for the approach is publicly available.
Abstract
Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 47% MFU on two 8xA800 nodes using SP for the LLAMA3-8B model training using sequence length 208K. Our code is publicly available at https://github.com/feifeibear/long-context-attention.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tencent/HunyuanVideomodel· 1.0k dl· ♡ 21421.0k dl♡ 2142
- 🤗jobs-git/HunyuanVideomodel· 1 dl1 dl
- 🤗tencent/HunyuanVideo-I2Vmodel· 145 dl· ♡ 350145 dl♡ 350
- 🤗arveo75/HunyuanVideomodel· 2 dl2 dl
- 🤗Blyskawica09/HunyuanVideomodel· 1 dl1 dl
- 🤗Khanbby/HunyuanVideomodel· 13 dl· ♡ 113 dl♡ 1
- 🤗YemenCreative2026/HunyuanVideomodel· 11 dl11 dl
- 🤗adisaljusi/HunyuanVideomodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Evolutionary Algorithms and Applications · Algorithms and Data Compression
