MoE Lens -- An Expert Is All You Need
Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, Shivam Raval

TL;DR
This paper systematically analyzes expert specialization in Mixture of Experts models, revealing that models rely heavily on a few experts, which suggests opportunities for inference optimization and understanding learned knowledge localization.
Contribution
It introduces a dual approach to analyze expert specialization in MoEs, combining routing pattern analysis and an early decoding framework, with empirical validation on DeepSeekMoE.
Findings
Few experts handle over 50% of routing decisions.
High cosine similarity (up to 0.95) between single and ensemble experts.
Perplexity increases by only 5% when using a single expert across domains.
Abstract
Mixture of Experts (MoE) models enable parameter-efficient scaling through sparse expert activations, yet optimizing their inference and memory costs remains challenging due to limited understanding of their specialization behavior. We present a systematic analysis of expert specialization in MoEs through two complementary approaches: domain-specific routing patterns and an early decoding framework that tracks expert contributions to output representations. Our analysis of the DeepSeekMoE model reveals that despite having 64 routed experts with 6 active for each layer's computation, the model predominantly relies on a few specialized experts, with the top-weighted expert's output closely approximating the full ensemble prediction. We quantitatively validate these findings through a systematic analysis of the token routing distribution, demonstrating that very few experts handle over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Expert finding and Q&A systems
