Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing
Chengxi Min, Wei Wang, Yahui Liu, Weixin Ye, Enver Sangineto, Qi Wang, Yao Zhao

TL;DR
This paper introduces a semantic-aware enhancement for Soft MoE models, aligning expert routing with semantic regions to improve efficiency, interpretability, and performance in vision tasks.
Contribution
It proposes a foreground-guided auxiliary loss and LayerScale mechanism to explicitly incorporate semantic priors into MoE routing, enhancing interpretability and accuracy.
Findings
Improved accuracy on ImageNet-1K and other benchmarks.
More interpretable expert routing patterns.
Seamless integration with existing Soft MoE frameworks.
Abstract
Mixture-of-Experts (MoE) models have emerged as a promising direction for scaling vision architectures efficiently. Among them, Soft MoE improves training stability by assigning each token to all experts via continuous dispatch weights. However, current designs overlook the semantic structure which is implicitly encoded in these weights, resulting in suboptimal expert routing. In this paper, we discover that dispatch weights in Soft MoE inherently exhibit segmentation-like patterns but are not explicitly aligned with semantic regions. Motivated by this observation, we propose a foreground-guided enhancement strategy. Specifically, we introduce a spatially aware auxiliary loss that encourages expert activation to align with semantic foreground regions. To further reinforce this supervision, we integrate a lightweight LayerScale mechanism that improves information flow and stabilizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Computing and Networks
MethodsMixture of Experts · LayerScale · ALIGN · Attentive Walk-Aggregating Graph Neural Network
