L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

Minghao Yang; Ren Togo; Guang Li; Takahiro Ogawa; Miki Haseyama

arXiv:2601.21349·cs.LG·May 15, 2026

L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

Minghao Yang, Ren Togo, Guang Li, Takahiro Ogawa, Miki Haseyama

PDF

TL;DR

L2R introduces a unified routing framework for Mixture-of-Experts models that improves expert discrimination and model performance by reshaping routing space and controlling scoring geometry.

Contribution

It proposes Low-rank & Lipschitz-controlled Routing (L2R), a novel method combining low-rank routing space and Lipschitz control for enhanced MoE routing stability.

Findings

01

L2R improves routing geometry and expert discrimination.

02

L2R enhances overall model performance on language and vision tasks.

03

Experiments demonstrate consistent benefits across different MoE settings.

Abstract

Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank & Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes both the routing space and scoring geometry. L2R performs expert assignment in a shared low-rank latent routing space and introduces Saturated Inner-Product Scoring (SIPS) to explicitly control the Lipschitz behavior of routing functions, yielding smoother and more stable routing geometry. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.