What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?

Ming-Kun Xie; Jia-Hao Xiao; Gang Niu; Lei Feng; Zhiqiang Kou; Min-Ling Zhang; and Masashi Sugiyama

arXiv:2508.06530·cs.CV·August 12, 2025

What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?

Ming-Kun Xie, Jia-Hao Xiao, Gang Niu, Lei Feng, Zhiqiang Kou, Min-Ling Zhang, and Masashi Sugiyama

PDF

Open Access

TL;DR

This paper introduces the HOPE benchmark, a new method for evaluating object hallucination in large vision-language models by generating highly misleading distractors, revealing their vulnerabilities more effectively than previous benchmarks.

Contribution

The paper proposes the HOPE benchmark, which uses content-aware and description-based hallucination searching to better assess LVLMs' object hallucination issues.

Findings

01

HOPE causes a 9-23% performance drop in state-of-the-art LVLMs.

02

HOPE outperforms POPE in exposing hallucination vulnerabilities.

03

Experimental results validate HOPE's effectiveness in rigorous hallucination assessment.

Abstract

Large Vision-Language Models (LVLMs), empowered by the success of Large Language Models (LLMs), have achieved impressive performance across domains. Despite the great advances in LVLMs, they still suffer from the unavailable object hallucination issue, which tends to generate objects inconsistent with the image content. The most commonly used Polling-based Object Probing Evaluation (POPE) benchmark evaluates this issue by sampling negative categories according to category-level statistics, \textit{e.g.}, category frequencies and co-occurrence. However, with the continuous advancement of LVLMs, the POPE benchmark has shown diminishing effectiveness in assessing object hallucination, as it employs a simplistic sampling strategy that overlooks image-specific information and restricts distractors to negative object categories only. In this paper, we introduce the Hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Big Data and Digital Economy · Misinformation and Its Impacts