Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Junsun Choi; Sam Son; Sunjin Choi; Hansung Kim; Yakun Sophia Shao; Scott Shenker; Sylvia Ratnasamy; Borivoje Nikolic

arXiv:2605.00254·cs.NI·May 4, 2026

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Junsun Choi, Sam Son, Sunjin Choi, Hansung Kim, Yakun Sophia Shao, Scott Shenker, Sylvia Ratnasamy, Borivoje Nikolic

PDF

TL;DR

This paper analyzes network topologies for cost-effective large language model serving with mixture-of-experts architectures, revealing switchless topologies as more economical and efficient than traditional scale-up networks.

Contribution

It provides the first systematic cross-layer comparison of network topologies for MoE LLM serving, highlighting the cost-effectiveness of switchless topologies and over-provisioned link bandwidths.

Findings

01

Switchless topologies outperform scale-up in cost-effectiveness by 20.6-56.2%.

02

3D full-mesh topology is Pareto-optimal for performance and cost.

03

Reducing link bandwidth improves throughput per cost by up to 27%.

Abstract

Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.