Linking Perception, Confidence and Accuracy in MLLMs

Yuetian Du; Yucheng Wang; Rongyu Zhang; Zhijie Xu; Boyu Yang; Ming Kong; Jie Liu; Qiang Zhu

arXiv:2603.12149·cs.CV·March 13, 2026

Linking Perception, Confidence and Accuracy in MLLMs

Yuetian Du, Yucheng Wang, Rongyu Zhang, Zhijie Xu, Boyu Yang, Ming Kong, Jie Liu, Qiang Zhu

PDF

Open Access

TL;DR

This paper identifies confidence miscalibration in Multi-modal Large Language Models and introduces a novel framework with confidence-based training and test-time scaling to improve perceptual sensitivity and overall performance.

Contribution

It proposes Confidence-Driven Reinforcement Learning and Confidence-Aware Test-Time Scaling to calibrate confidence and enhance MLLMs, achieving state-of-the-art results across multiple benchmarks.

Findings

01

Severe confidence miscalibration in MLLMs uncovered.

02

Proposed methods improve calibration and accuracy by 8.8%.

03

Framework achieves consistent gains across four benchmarks.

Abstract

Recent advances in Multi-modal Large Language Models (MLLMs) have predominantly focused on enhancing visual perception to improve accuracy. However, a critical question remains unexplored: Do models know when they do not know? Through a probing experiment, we reveal a severe confidence miscalibration problem in MLLMs. To address this, we propose Confidence-Driven Reinforcement Learning (CDRL), which uses original-noise image pairs and a novel confidence-based reward to enhance perceptual sensitivity and robustly calibrate the model's confidence. Beyond training benefits, calibrated confidence enables more effective test-time scaling as a free lunch. We further propose Confidence-Aware Test-Time Scaling (CA-TTS), which dynamically coordinates Self-Consistency, Self-Reflection, and Visual Self-Check modules guided by confidence signals. An Expert Model acts in multiple roles (e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis