Exploiting Multicast for Accelerating Collective Communication
Chao Xu, Xu Zhang, Zihang Luo, Yuyan Wu, Guoxin Qian, Yufeng Yao, Chihyung Wang, Jingbin Zhou

TL;DR
This paper introduces MultiWrite, a multicast-based method that reduces collective communication latency in AI workloads by eliminating redundant data transmissions, leading to significant latency improvements.
Contribution
MultiWrite is a novel many-to-many transmission semantic that overcomes traditional multicast limitations for AI workloads, improving latency and network efficiency.
Findings
Achieves up to 33% latency reduction on Ascend NPUs.
Reduces network congestion by eliminating redundant data packets.
Demonstrates effectiveness in large-scale AI model training and inference.
Abstract
Reducing collective communication latency is a critical goal for large model training and inference in both academia and industry. Many-to-many communications, such as AllGather and AlltoAll (dispatch), are core components of modern parallelization strategies. State-of-the-art implementations of these communications rely on unicast-based writes and transmit duplicate copies of the same data across physical links for multiple receivers. This redundant transmission congests network bottlenecks and degrades end-to-end latency. We present MultiWrite, a novel many-to-many transmission semantic that eliminates redundant packets to directly reduce operator latency. MultiWrite adopts multicast principles while addressing critical limitations of traditional multicast for AI workloads. These limitations include heavy management plane overhead and ecosystem compatibility issues. We implement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
