H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
Nhi Pham, Michael Schott

TL;DR
H-POPE is a hierarchical benchmark designed to systematically evaluate hallucinations in large vision-language models, revealing their tendencies to generate inconsistent object attributes and existence claims.
Contribution
The paper introduces H-POPE, a novel coarse-to-fine benchmark for assessing hallucinations in LVLMs, focusing on object existence and attribute consistency.
Findings
Models frequently hallucinate object existence.
Hallucinations are more common with fine-grained attributes.
Models often do not rely solely on visual input for output.
Abstract
By leveraging both texts and images, large vision language models (LVLMs) have shown significant progress in various multi-modal tasks. Nevertheless, these models often suffer from hallucinations, e.g., they exhibit inconsistencies between the visual input and the textual output. To address this, we propose H-POPE, a coarse-to-fine-grained benchmark that systematically assesses hallucination in object existence and attributes. Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes. We further investigate whether these models rely on visual input to formulate the output texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
