How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
Sumin Park, Noseong Park

TL;DR
This paper introduces MASS, a novel framework for optimizing the number and specialization of experts in Sparse Mixture-of-Experts models, improving semantic differentiation and performance across domains.
Contribution
MASS employs a gradient-based semantic drift detector and adaptive routing to dynamically expand and specialize experts, addressing limitations of prior SMoE approaches.
Findings
MASS converges to an optimal expert balance in synthetic tests.
MASS outperforms strong MoE baselines on language and vision datasets.
Enhanced semantic specialization improves model performance.
Abstract
Finding the optimal configuration of Sparse Mixture-ofExperts (SMoE) that maximizes semantic differentiation among experts is essential for exploiting the full potential of MoE architectures. However, existing SMoE frameworks either heavily rely on hyperparameter tuning or overlook the importance of diversifying semantic roles across experts when adapting the expert pool size. We propose Mixture-of-Experts for Adaptive Semantic Specialization (MASS), a semanticaware MoE framework for adaptive expert expansion and dynamic routing. MASS introduces two key advancements: (i) a gradient-based semantic drift detector that prompts targeted expert expansion when the existing expert pool lacks capacity to capture the full semantic diversity of the data, and (ii) an integration of adaptive routing strategy that dynamically adjusts expert usage based on token-level routing confidence mass. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
