Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints

Seng Pei Liew; Kenta Shinzato; Yuyang Dong

arXiv:2601.08215·cs.CL·January 14, 2026

Towards Principled Design of Mixture-of-Experts Language Models under Memory and Inference Constraints

Seng Pei Liew, Kenta Shinzato, Yuyang Dong

PDF

Open Access

TL;DR

This paper investigates the design principles of Mixture-of-Experts language models, revealing that total parameters and expert sparsity are key to optimal performance under memory and inference constraints.

Contribution

It introduces a principled framework for MoE architecture design focusing on maximizing total parameters and optimizing expert sparsity within resource limits.

Findings

01

Performance is mainly determined by total parameters and expert sparsity.

02

Larger total number of experts slightly reduces performance due to model dimension constraints.

03

A simple design principle is proposed to optimize MoE architecture under constraints.

Abstract

Modern Mixture-of-Experts (MoE) language models are designed based on total parameters (memory footprint) and active parameters (inference cost). However, we find these two factors alone are insufficient to describe an optimal architecture. Through a systematic study, we demonstrate that MoE performance is primarily determined by total parameters ( $N_{t o t a l}$ ) and expert sparsity ( $s := n_{e x p} / n_{t o p k}$ ). Moreover, $n_{e x p}$ and $n_{t o p k}$ do not "cancel out" within the sparsity ratio; instead, a larger total number of experts slightly penalizes performance by forcing a reduction in core model dimensions (depth and width) to meet memory constraints. This motivates a simple principle for MoE design which maximizes $N_{t o t a l}$ while minimizing $s$ (maximizing $n_{t o p k}$ ) and $n_{e x p}$ under the given constraints. Our findings provide a robust framework for resolving architectural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning