VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision Language Models
Chenyu Wang, Tianle Chen, H. M. Sabbir Ahmad, Kayhan Batmanghelich, Wenchao Li

TL;DR
This paper introduces VLM-UQBench, a comprehensive benchmark for evaluating modality-specific and cross-modal uncertainty in vision-language models, revealing current UQ methods' limitations in detecting subtle, instance-level ambiguities.
Contribution
The paper presents VLM-UQBench, a new benchmark with perturbation-based evaluation metrics for modality-aware uncertainty in VLMs, highlighting gaps in current UQ methods.
Findings
Existing UQ methods show modality-specific strengths and weaknesses.
UQ scores weakly correlate with hallucinations and often fail to detect subtle ambiguities.
UQ methods perform comparably to reasoning-based baselines on overt ambiguities but struggle with fine-grained uncertainty.
Abstract
Uncertainty quantification (UQ) is vital for ensuring that vision-language models (VLMs) behave safely and reliably. A central challenge is to localize uncertainty to its source, determining whether it arises from the image, the text, or misalignment between the two. We introduce VLM-UQBench, a benchmark for modality-specific and cross-modal data uncertainty in VLMs, It consists of 600 real-world samples drawn from the VizWiz dataset, curated into clean, image-, text-, and cross-modal uncertainty subsets, and a scalable perturbation pipeline with 8 visual, 5 textual, and 3 cross-modal perturbations. We further propose two simple metrics that quantify the sensitivity of UQ scores to these perturbations and their correlation with hallucinations, and use them to evaluate a range of UQ methods across four VLMs and three datasets. Empirically, we find that: (i) existing UQ methods exhibit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Topic Modeling
