Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling
Xianzhi Zhang, Yue Xu, Yinlin Zhu, Di Wu, Yipeng Zhou, Miao Hu, Guocong Quan

TL;DR
This paper introduces M-CMAB, a novel multi-adapter framework for online multi-modal large language model inference scheduling, effectively handling heterogeneous budgets and uncertainties to improve response quality.
Contribution
The paper proposes M-CMAB, a new multi-adapter-based scheduling framework with theoretical guarantees, addressing challenges of multi-modal task representation and online decision-making under constraints.
Findings
Outperforms state-of-the-art baselines across various budget regimes.
Achieves up to 14.18% higher reward compared to existing methods.
Closely tracks an oracle-aided upper bound in experiments.
Abstract
Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
