Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Reza Khanmohammadi; Erfan Miahi; Simerjot Kaur; Charese H. Smiley; Ivan Brugere; Kundan Thind; and Mohammad M. Ghassemi

arXiv:2605.10893·cs.CL·May 18, 2026

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur, Charese H. Smiley, Ivan Brugere, Kundan Thind, and Mohammad M. Ghassemi

PDF

1 Datasets

TL;DR

This paper introduces BICR, a confidence estimation framework for LVLMs that explicitly distinguishes between grounded and ungrounded predictions by contrasting real and blacked-out images, improving reliability detection.

Contribution

BICR is a novel, model-agnostic method that trains a lightweight probe to assess visual grounding reliability without extra inference cost.

Findings

01

BICR achieves superior calibration and discrimination across five LVLMs.

02

It outperforms existing baselines with 4-18x fewer parameters.

03

BICR is effective across diverse tasks like VQA, hallucination detection, and medical imaging.

Abstract

Large vision-language models suffer from visual ungroundedness: they can produce a fluent, confident, and even correct response driven entirely by language priors, with the image contributing nothing to the prediction. Existing confidence estimation methods cannot detect this, as they observe model behavior under normal inference with no mechanism to determine whether a prediction was shaped by the image or by text alone. We introduce BICR (Blind-Image Contrastive Ranking), a model-agnostic confidence estimation framework that makes this contrast explicit during training by extracting hidden states from a frozen LVLM twice: once with the real image-question pair, and once with the image blacked out while the question is held fixed. A lightweight probe is trained on the real-image hidden state and regularized by a ranking loss that penalizes higher confidence on the blacked-out view,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ledengary/VLCB
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.