MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

Zhuoshan Zhou; Chen Zhang; Shuyi Zhang; Qijun Zhang; Haibo Wang; Zhe Zhou; Zhipeng Tu; Guangyu Sun; Yijia Diao; Zhigang Ji; Jingwen Leng; Guanghui He; Minyi Guo

arXiv:2605.05888·cs.AR·May 8, 2026

MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

Zhuoshan Zhou, Chen Zhang, Shuyi Zhang, Qijun Zhang, Haibo Wang, Zhe Zhou, Zhipeng Tu, Guangyu Sun, Yijia Diao, Zhigang Ji, Jingwen Leng, Guanghui He, Minyi Guo

PDF

TL;DR

MoE-Hub introduces a hardware-software co-design to overcome GPU communication bottlenecks in large language models, enabling seamless overlap and significant speedups.

Contribution

It proposes a destination-agnostic communication paradigm with hardware acceleration, addressing abstraction mismatches in multi-GPU MoE systems.

Findings

01

Achieves 1.40x-3.08x per-layer speedup over state-of-the-art.

02

Enables seamless communication overlap with hardware support.

03

Improves software flexibility and performance in MoE training.

Abstract

The Mixture-of-Experts (MoE) architecture is crucial for scaling large language models, but its scalability is severely limited by inter-GPU communication bottlenecks in multi-GPU systems. Although overlapping communication with computation is a widely recognized optimization, its effective deployment still remains challenging, both in terms of performance and programmability. In this work, we identify the root cause as a fundamental abstraction mismatch between MoE's dynamic, irregular token-to-expert mapping and the static, address-centric communication model of modern GPUs, which necessitates a complex software mediation phase to resolve addresses before data transfers, limiting performance and software flexibility. To resolve this, we propose MoE-Hub, a hardware-software co-design that introduces a destination-agnostic communication paradigm. MoE-Hub decouples data transmission from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.