Visual hallucination detection in large vision-language models via evidential conflict

Tao Huang; Zhekun Liu; Rui Wang; Yang Zhang; Liping Jing

arXiv:2506.19513·cs.CV·June 25, 2025

Visual hallucination detection in large vision-language models via evidential conflict

Tao Huang, Zhekun Liu, Rui Wang, Yang Zhang, Liping Jing

PDF

TL;DR

This paper introduces a new benchmark for evaluating visual hallucinations in large vision-language models, focusing on perception and reasoning errors, and proposes a Dempster-Shafer theory-based detection method that outperforms existing uncertainty metrics.

Contribution

The paper develops the PRE-HAL dataset for comprehensive hallucination evaluation and introduces the first DST-based detection method for LVLMs, addressing both perception and reasoning hallucinations.

Findings

01

PRE-HAL exposes more vulnerabilities in LVLMs, especially in relation reasoning.

02

The DST-based method outperforms five baseline uncertainty metrics.

03

Achieves average AUROC improvements of 4%, 10%, and 7% across three LVLMs.

Abstract

Despite the remarkable multimodal capabilities of Large Vision-Language Models (LVLMs), discrepancies often occur between visual inputs and textual outputs--a phenomenon we term visual hallucination. This critical reliability gap poses substantial risks in safety-critical Artificial Intelligence (AI) applications, necessitating a comprehensive evaluation benchmark and effective detection methods. Firstly, we observe that existing visual-centric hallucination benchmarks mainly assess LVLMs from a perception perspective, overlooking hallucinations arising from advanced reasoning capabilities. We develop the Perception-Reasoning Evaluation Hallucination (PRE-HAL) dataset, which enables the systematic evaluation of both perception and reasoning capabilities of LVLMs across multiple visual semantics, such as instances, scenes, and relations. Comprehensive evaluation with this new benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.