MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

Bo Li; Chuan Wu; shaolin Zhu

arXiv:2605.05225·cs.LG·May 11, 2026

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

Bo Li, Chuan Wu, shaolin Zhu

PDF

TL;DR

MACS is a training-free inference framework that improves the efficiency of multimodal MoE large language models by addressing load balancing challenges through modality-aware capacity scaling.

Contribution

It introduces a novel, training-free approach with entropy-weighted load and dynamic capacity mechanisms to better balance expert resources in multimodal inference.

Findings

01

MACS significantly reduces inference bottlenecks in multimodal MoE models.

02

It outperforms existing load balancing methods on multiple benchmarks.

03

MACS effectively adapts to varying modal compositions during inference.

Abstract

Mixture-of-Experts Multimodal Large Language Models (MoE MLLMs) suffer from a significant efficiency bottleneck during Expert Parallelism (EP) inference due to the straggler effect. This issue is worsened in the multimodal context, as existing token-count-based load balancing methods fail to address two unique challenges: (1) Information Heterogeneity, where numerous redundant visual tokens are treated equally to semantically critical ones, and (2) Modality Dynamics, where varying visual to text ratios across tasks lead to resource misallocation. To address these challenges, we propose MACS (Modality-Aware Capacity Scaling), a training-free inference framework. Specifically, MACS introduces an Entropy-Weighted Load mechanism to quantify the semantic value of visual tokens, addressing information heterogeneity. Additionally, the Dynamic Modality-Adaptive Capacity mechanism allocates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.