Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking

Ravi Ghadia; Maksim Abraham; Sergei Vorobyov; Max Ryabinin

arXiv:2602.21196·cs.LG·February 25, 2026

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking

Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin

PDF

Open Access

TL;DR

This paper introduces UPipe, a memory-efficient context parallelism method that enables training of Transformer models with much longer sequences by reducing activation memory usage through headwise chunking.

Contribution

UPipe is a novel technique that performs fine-grained chunking at the attention head level to significantly reduce memory usage and support longer context lengths in Transformer training.

Findings

01

Reduces activation memory by up to 87.5% for 32B Transformers.

02

Supports context length of 5 million tokens for Llama3-8B on a single node.

03

Matches previous methods in training speed while enabling longer sequences.

Abstract

Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed Ulysses, enable scaling over the context dimension but do not focus on memory efficiency, which limits the sequence lengths they can support. More advanced techniques, such as Fully Pipelined Distributed Transformer or activation offloading, can further extend the possible context length at the cost of training throughput. In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level. This technique significantly reduces the activation memory usage of self-attention, breaking the activation memory barrier and unlocking much longer context lengths. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis