VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation
Ruiyang Zhang, Hu Zhang, Zhedong Zheng

TL;DR
VL-Uncertainty is a novel framework that detects hallucinations in large vision-language models by measuring response uncertainty through semantic clustering, outperforming existing methods across multiple benchmarks.
Contribution
Introduces the first uncertainty-based hallucination detection method for LVLMs that does not rely on ground-truth annotations.
Findings
Outperforms baseline methods in hallucination detection
Effective across diverse LVLMs and benchmarks
Utilizes semantic clustering and entropy for uncertainty measurement
Abstract
Given the higher information load processed by large vision-language models (LVLMs) compared to single-modal LLMs, detecting LVLM hallucinations requires more human and time expense, and thus rise a wider safety concerns. In this paper, we introduce VL-Uncertainty, the first uncertainty-based framework for detecting hallucinations in LVLMs. Different from most existing methods that require ground-truth or pseudo annotations, VL-Uncertainty utilizes uncertainty as an intrinsic metric. We measure uncertainty by analyzing the prediction variance across semantically equivalent but perturbed prompts, including visual and textual data. When LVLMs are highly confident, they provide consistent responses to semantically equivalent queries. However, when uncertain, the responses of the target LVLM become more random. Considering semantically similar answers with different wordings, we cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
