Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

Jiajie Yang

arXiv:2506.21328·cs.LG·June 27, 2025

Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts

Jiajie Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Latent Prototype Routing (LPR), a new expert routing method for Mixture-of-Experts models that significantly improves load balancing and resource utilization without sacrificing model performance.

Contribution

LPR offers a generalized routing framework based on clustering that enhances load balancing in MoE models, addressing a key limitation of existing approaches.

Findings

01

Reduces Gini coefficient of expert load from 0.70 to 0.035

02

Improves min-max expert load ratio from 1e-6 to 0.70

03

Achieves near-perfect load balancing in multiple MoE models

Abstract

Mixture-of-Experts (MoE) architectures have emerged as a key strategy for scaling large language models (LLMs) efficiently. However, current MoE systems suffer from severe load imbalance, where only a small subset of experts is consistently activated during training and inference, leading to significant underutilization of model capacity and computational resources. In this work, we revisit expert routing through a clustering perspective and propose Latent Prototype Routing (LPR), a novel routing framework that generalizes existing approaches while promoting balanced expert utilization without compromising downstream performance. Extensive experiments across multiple open-source MoE models -- including DeepSeek-V3, Qwen3-MoE, and Mixtral -- demonstrate that LPR reduces the Gini coefficient of expert load from 0.70 to 0.035 on average, improves the min-max expert load ratio from 1e-6 to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rando11199/latentprototyperouter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Recommender Systems and Techniques

MethodsMixture of Experts