SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

Zi-Hao Bo; Yaqian Li; Anzhou Hou; Rinyoichi Takezoe; Ertao Zhao; Tianxiang Pan; Jiale Yan; Mo Guang; Kaiwen Long

arXiv:2604.23996·cs.CV·April 28, 2026

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

Zi-Hao Bo, Yaqian Li, Anzhou Hou, Rinyoichi Takezoe, Ertao Zhao, Tianxiang Pan, Jiale Yan, Mo Guang, Kaiwen Long

PDF

TL;DR

This paper introduces SMoES, a new modality-guided expert routing method for MoE-based vision-language models that improves task performance and deployment efficiency by leveraging layer-dependent modality fusion patterns.

Contribution

SMoES proposes dynamic soft modality scores, an expert binning mechanism, and mutual information regularization to enhance expert specialization in MoE-VLMs.

Findings

01

Achieves 0.9% and 4.2% average gains on multimodal and language tasks.

02

Reduces EP communication overhead by 56.1%.

03

Improves throughput by 12.3% in deployment.

Abstract

Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent modality fusion patterns in MoE-VLMs and provide little guidance for expert specialization. We propose Soft Modality-guided Expert Specialization (SMoES), which consists of dynamic soft modality scores that capture layer-dependent fusion patterns, an expert binning mechanism aligned with expert-parallel deployment, and an inter-bin mutual information regularization that encourages coherent modality specialization. Our method leverages attention-based or Gaussian-statistics modality scores to optimize mutual information regularization. Experiments across four MoE-based VLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.