TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
Houming Wu, Ling Chen

TL;DR
TawPipe introduces a topology-aware weight pipeline parallelism method that leverages hierarchical bandwidth in distributed clusters to efficiently train large language models with long contexts, reducing communication overhead and improving scalability.
Contribution
It proposes a novel topology-aware approach that optimizes intra- and inter-node communication, avoiding redundant data transfers and overlapping communication with computation for better performance.
Findings
Achieves higher throughput on up to 24 GPUs
Reduces cross-node communication significantly
Outperforms existing state-of-the-art methods in scalability
Abstract
Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by partitioning models across devices, it incurs activation communication overhead that scales linearly with sequence length, limiting efficiency in long-context training. Recent weight-passing approaches (e.g., WeiPipe) mitigate this by transmitting model weights instead of activations, but suffer from redundant peer-to-peer (P2P) transfers and underutilized intra-node bandwidth. We propose TawPipe--topology-aware weight pipeline parallelism, which exploits hierarchical bandwidth in distributed clusters for improved communication efficiency. TawPipe: (i) groups devices based on topology to optimize intra-node collective and inter-node P2P communication; (ii) assigns each device a fixed shard of model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · IoT and Edge/Fog Computing
