Expert-Token Resonance MoE: Bidirectional Routing with Efficiency Affinity-Driven Active Selection
Jing Li, Zhijie Sun, Dachao Lin, Xuan He, Binfan Zheng, Yi Lin, Rongqian Zhao, Xin Chen

TL;DR
This paper introduces Expert-Token Resonance (ETR), a bidirectional routing mechanism for MoE models that improves training efficiency and reduces redundancy by adaptively coordinating token and expert interactions.
Contribution
ETR presents a theoretically grounded, dynamic routing approach with affinity-based architecture, bidirectional selection, and adaptive capacity, addressing inefficiencies and homogenization in MoE models.
Findings
Achieves up to 46.6% training efficiency improvement
Gains 9.7%-14.5% performance across multiple benchmarks
Reduces expert capacity lower bound by up to 40%
Abstract
Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models by activating only a subset of parameters per input. However, existing MoE models suffer from two critical limitations: (1) inefficient token-to-expert routing that causes excessive communication overhead, and (2) expert homogenization that leads to redundant computations. Current approaches address these challenges separately, failing to achieve simultaneous improvements in both training efficiency and model performance. We present Expert-Token Resonance (ETR), a theoretically-grounded bidirectional routing mechanism that fundamentally reimagines expert-token interactions in MoE architectures. Our key insight is that optimal routing requires adaptive coordination between token-choice routing (TCR) during early training phases and expert-choice routing (ECR) in later stages. We prove that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · IPv6, Mobility, Handover, Networks, Security · Algorithms and Data Compression
MethodsMixture of Experts
