Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving
Junsun Choi, Sam Son, Sunjin Choi, Hansung Kim, Yakun Sophia Shao, Scott Shenker, Sylvia Ratnasamy, Borivoje Nikolic

TL;DR
This paper analyzes network topologies for cost-effective large language model serving with mixture-of-experts architectures, revealing switchless topologies as more economical and efficient than traditional scale-up networks.
Contribution
It provides the first systematic cross-layer comparison of network topologies for MoE LLM serving, highlighting the cost-effectiveness of switchless topologies and over-provisioned link bandwidths.
Findings
Switchless topologies outperform scale-up in cost-effectiveness by 20.6-56.2%.
3D full-mesh topology is Pareto-optimal for performance and cost.
Reducing link bandwidth improves throughput per cost by up to 27%.
Abstract
Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
