AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process

Xintong Zhang; Xiaowen Zhang; Jingrong Wu; Zhi Gao; Shilin Yan; Zhenxin Diao; Kunpeng Gao; Xuanyan Chen; Yuwei Wu; Yunde Jia; Qing Li

arXiv:2602.02676·cs.CV·April 9, 2026

AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process

Xintong Zhang, Xiaowen Zhang, Jingrong Wu, Zhi Gao, Shilin Yan, Zhenxin Diao, Kunpeng Gao, Xuanyan Chen, Yuwei Wu, Yunde Jia, Qing Li

PDF

1 Repo 1 Datasets

TL;DR

AdaptMMBench introduces a comprehensive benchmark for evaluating adaptive multimodal reasoning in vision-language models, focusing on dynamic difficulty assessment, mode selection rationality, and multi-dimensional process analysis.

Contribution

It proposes a novel MCC-based metric and a multi-domain benchmark to evaluate adaptive reasoning, addressing limitations of static difficulty labels and simplistic metrics.

Findings

01

Adaptive mode selection improves with model capacity but is decoupled from final accuracy.

02

Key step coverage correlates with performance across models.

03

Tool effectiveness varies significantly among different architectures.

Abstract

Adaptive multimodal reasoning has emerged as a promising frontier in Vision-Language Models (VLMs), aiming to dynamically modulate between tool-augmented visual reasoning and text reasoning to enhance both effectiveness and efficiency. However, existing evaluations rely on static difficulty labels and simplistic metrics, which fail to capture the dynamic nature of difficulty relative to varying model capacities. Consequently, they obscure the distinction between adaptive mode selection and general performance while neglecting fine-grained process analyses. In this paper, we propose AdaptMMBench, a comprehensive benchmark for adaptive multimodal reasoning across five domains: real-world, OCR, GUI, knowledge, and math, encompassing both direct perception and complex reasoning tasks. AdaptMMBench utilizes a Matthews Correlation Coefficient (MCC) metric to evaluate the selection rationality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xtong-zhang/AdaptMMBench
github

Datasets

xintongzhang/AdaptMMBench
dataset· 125 dl
125 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.