Routing-Free Mixture-of-Experts
Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma

TL;DR
This paper introduces Routing-Free MoE, a novel model that removes centralized routing, enabling experts to self-activate and improving scalability and robustness through adaptive load balancing.
Contribution
It proposes a new MoE architecture without centralized routing, using continuous gradient optimization for expert activation and a unified adaptive load-balancing framework.
Findings
Routing-Free MoE outperforms baselines in scalability and robustness.
The model simplifies MoE design by removing external routers and hard-coded mechanisms.
Extensive experiments validate the effectiveness of the proposed approach.
Abstract
Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
