AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

Yuting Gao; Wang Lan; Hengyuan Zhao; Linjiang Huang; Si Liu; Qingpei Guo

arXiv:2511.18314·cs.LG·November 25, 2025

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

Yuting Gao, Wang Lan, Hengyuan Zhao, Linjiang Huang, Si Liu, Qingpei Guo

PDF

Open Access

TL;DR

AnyExperts introduces a dynamic, importance-aware expert routing framework for multimodal MoE models, optimizing resource allocation and maintaining high performance across vision, audio, and NLP tasks.

Contribution

It proposes a novel on-demand, budget-aware routing strategy that adaptively allocates real and virtual experts based on semantic importance, improving efficiency.

Findings

01

Achieves 40% fewer real expert activations on image/video tasks.

02

Maintains performance while reducing real expert usage by 10% on text-dense tasks.

03

Enhances efficiency and effectiveness of multimodal MoE models.

Abstract

Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (typically activating a fixed number of experts per token) ignoring the inherent heterogeneity in semantic importance across modalities. This leads to suboptimal compute allocation, where redundant tokens consume as many resources as critical ones. To address this, we propose AnyExperts, a novel on-demand, budget-aware dynamic routing framework that allocates a variable total number of expert slots per token based on its semantic importance. Crucially, to prevent uncontrolled compute growth, the total slots per token are constrained within a fixed range, and each slot is filled by either a real expert or a virtual expert, with the virtual share capped at a small maximum (e.g., 20%). The model then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning