Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling

Xianzhi Zhang; Yue Xu; Yinlin Zhu; Di Wu; Yipeng Zhou; Miao Hu; Guocong Quan

arXiv:2603.06403·cs.LG·March 9, 2026

Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling

Xianzhi Zhang, Yue Xu, Yinlin Zhu, Di Wu, Yipeng Zhou, Miao Hu, Guocong Quan

PDF

Open Access

TL;DR

This paper introduces M-CMAB, a novel multi-adapter framework for online multi-modal large language model inference scheduling, effectively handling heterogeneous budgets and uncertainties to improve response quality.

Contribution

The paper proposes M-CMAB, a new multi-adapter-based scheduling framework with theoretical guarantees, addressing challenges of multi-modal task representation and online decision-making under constraints.

Findings

01

Outperforms state-of-the-art baselines across various budget regimes.

02

Achieves up to 14.18% higher reward compared to existing methods.

03

Closely tracks an oracle-aided upper bound in experiments.

Abstract

Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling