VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

Wenyi Xiao; Xinchi Xu; Leilei Gan

arXiv:2604.09529·cs.CV·April 13, 2026

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

Wenyi Xiao, Xinchi Xu, Leilei Gan

PDF

TL;DR

This paper introduces VL-Calibration, a reinforcement learning framework that improves confidence calibration and reasoning accuracy in large vision-language models by decoupling visual and reasoning confidence.

Contribution

It proposes a novel decoupled confidence calibration method for LVLMs, addressing hallucinations and improving both calibration and reasoning performance.

Findings

01

VL-Calibration improves calibration across thirteen benchmarks.

02

It enhances visual reasoning accuracy in LVLMs.

03

The method generalizes across different model scales and architectures.

Abstract

Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, which hinders their usage in high-stakes domains. Existing verbalized confidence calibration methods, largely developed for text-only LLMs, typically optimize a single holistic confidence score using binary answer-level correctness. This design is mismatched to LVLMs: an incorrect prediction may arise from perceptual failures or from reasoning errors given correct perception, and a single confidence conflates these sources while visual uncertainty is often dominated by language priors. To address these issues, we propose VL-Calibration, a reinforcement learning framework that explicitly decouples confidence into visual and reasoning confidence. To supervise visual confidence without ground-truth perception labels, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.