MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference

Xinru Tang; Jingxiang Hou; Dingcheng Jiang; Taiquan Wei; Jiaxin Liu; Jinyi Deng; Huizheng Wang; Qize Yang; Haoran Shang; Chao Li; Yang Hu; Shouyi Yin

arXiv:2510.25258·cs.DC·October 30, 2025

MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference

Xinru Tang, Jingxiang Hou, Dingcheng Jiang, Taiquan Wei, Jiaxin Liu, Jinyi Deng, Huizheng Wang, Qize Yang, Haoran Shang, Chao Li, Yang Hu, Shouyi Yin

PDF

TL;DR

This paper introduces novel co-design techniques, ER-Mapping and NI-Balancer, to optimize large-scale expert parallel inference on wafer-scale chips, significantly reducing communication overhead and improving performance.

Contribution

It presents ER-Mapping and NI-Balancer, new methods tailored for wafer-scale chips to enhance MoE model efficiency by balancing communication and reducing migration overhead.

Findings

01

ER-Mapping reduces communication by up to 62%.

02

NI-Balancer improves MoE computation by 54%.

03

WSC platform outperforms NVL72 supernode with 39% higher per-device performance.

Abstract

As large language models (LLMs) continue to scale up, mixture-of-experts (MoE) has become a common technology in SOTA models. MoE models rely on expert parallelism (EP) to alleviate memory bottleneck, which introduces all-to-all communication to dispatch and combine tokens across devices. However, in widely-adopted GPU clusters, high-overhead cross-node communication makes all-to-all expensive, hindering the adoption of EP. Recently, wafer-scale chips (WSCs) have emerged as a platform integrating numerous devices on a wafer-sized interposer. WSCs provide a unified high-performance network connecting all devices, presenting a promising potential for hosting MoE models. Yet, their network is restricted to a mesh topology, causing imbalanced communication pressure and performance loss. Moreover, the lack of on-wafer disk leads to high-overhead expert migration on the critical path. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.