Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing

Chengxi Min; Wei Wang; Yahui Liu; Weixin Ye; Enver Sangineto; Qi Wang; Yao Zhao

arXiv:2505.18586·cs.CV·May 27, 2025

Guiding the Experts: Semantic Priors for Efficient and Focused MoE Routing

Chengxi Min, Wei Wang, Yahui Liu, Weixin Ye, Enver Sangineto, Qi Wang, Yao Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semantic-aware enhancement for Soft MoE models, aligning expert routing with semantic regions to improve efficiency, interpretability, and performance in vision tasks.

Contribution

It proposes a foreground-guided auxiliary loss and LayerScale mechanism to explicitly incorporate semantic priors into MoE routing, enhancing interpretability and accuracy.

Findings

01

Improved accuracy on ImageNet-1K and other benchmarks.

02

More interpretable expert routing patterns.

03

Seamless integration with existing Soft MoE frameworks.

Abstract

Mixture-of-Experts (MoE) models have emerged as a promising direction for scaling vision architectures efficiently. Among them, Soft MoE improves training stability by assigning each token to all experts via continuous dispatch weights. However, current designs overlook the semantic structure which is implicitly encoded in these weights, resulting in suboptimal expert routing. In this paper, we discover that dispatch weights in Soft MoE inherently exhibit segmentation-like patterns but are not explicitly aligned with semantic regions. Motivated by this observation, we propose a foreground-guided enhancement strategy. Specifically, we introduce a spatially aware auxiliary loss that encourages expert activation to align with semantic foreground regions. To further reinforce this supervision, we integrate a lightweight LayerScale mechanism that improves information flow and stabilizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

0930mcx/guiding-experts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Computing and Networks

MethodsMixture of Experts · LayerScale · ALIGN · Attentive Walk-Aggregating Graph Neural Network