TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training

Houming Wu; Ling Chen

arXiv:2511.09741·cs.LG·November 14, 2025

TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training

Houming Wu, Ling Chen

PDF

Open Access

TL;DR

TawPipe introduces a topology-aware weight pipeline parallelism method that leverages hierarchical bandwidth in distributed clusters to efficiently train large language models with long contexts, reducing communication overhead and improving scalability.

Contribution

It proposes a novel topology-aware approach that optimizes intra- and inter-node communication, avoiding redundant data transfers and overlapping communication with computation for better performance.

Findings

01

Achieves higher throughput on up to 24 GPUs

02

Reduces cross-node communication significantly

03

Outperforms existing state-of-the-art methods in scalability

Abstract

Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by partitioning models across devices, it incurs activation communication overhead that scales linearly with sequence length, limiting efficiency in long-context training. Recent weight-passing approaches (e.g., WeiPipe) mitigate this by transmitting model weights instead of activations, but suffer from redundant peer-to-peer (P2P) transfers and underutilized intra-node bandwidth. We propose TawPipe--topology-aware weight pipeline parallelism, which exploits hierarchical bandwidth in distributed clusters for improved communication efficiency. TawPipe: (i) groups devices based on topology to optimize intra-node collective and inter-node P2P communication; (ii) assigns each device a fixed shard of model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · IoT and Edge/Fog Computing