H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models

Nhi Pham; Michael Schott

arXiv:2411.04077·cs.CV·May 12, 2026

H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models

Nhi Pham, Michael Schott

PDF

TL;DR

H-POPE is a hierarchical benchmark designed to systematically evaluate hallucinations in large vision-language models, revealing their tendencies to generate inconsistent object attributes and existence claims.

Contribution

The paper introduces H-POPE, a novel coarse-to-fine benchmark for assessing hallucinations in LVLMs, focusing on object existence and attribute consistency.

Findings

01

Models frequently hallucinate object existence.

02

Hallucinations are more common with fine-grained attributes.

03

Models often do not rely solely on visual input for output.

Abstract

By leveraging both texts and images, large vision language models (LVLMs) have shown significant progress in various multi-modal tasks. Nevertheless, these models often suffer from hallucinations, e.g., they exhibit inconsistencies between the visual input and the textual output. To address this, we propose H-POPE, a coarse-to-fine-grained benchmark that systematically assesses hallucination in object existence and attributes. Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes. We further investigate whether these models rely on visual input to formulate the output texts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.