AutoHallusion: Automatic Generation of Hallucination Benchmarks for   Vision-Language Models

Xiyang Wu; Tianrui Guan; Dianqi Li; Shuaiyi Huang; Xiaoyu Liu; Xijun; Wang; Ruiqi Xian; Abhinav Shrivastava; Furong Huang; Jordan Lee Boyd-Graber,; Tianyi Zhou; Dinesh Manocha

arXiv:2406.10900·cs.CV·October 10, 2024

AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun, Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber,, Tianyi Zhou, Dinesh Manocha

PDF

Open Access 3 Repos 1 Datasets 1 Video

TL;DR

AutoHallusion introduces an automated method to generate diverse hallucination benchmarks for vision-language models, enabling scalable evaluation and revealing common failure patterns to improve model robustness.

Contribution

It is the first automated approach to create diverse hallucination benchmarks for LVLMs, reducing human bias and enabling large-scale evaluation.

Findings

01

High success rate (97.7% and 98.7%) in inducing hallucinations across models.

02

Generated benchmarks challenge models to overcome contextual biases.

03

Revealed common failure patterns and reasons for hallucinations.

Abstract

Large vision-language models (LVLMs) are prone to hallucinations, where certain contextual cues in an image can trigger the language module to produce overconfident and incorrect reasoning about abnormal or hypothetical objects. While some benchmarks have been developed to investigate LVLM hallucinations, they often rely on hand-crafted corner cases whose failure patterns may not generalize well. Additionally, fine-tuning on these examples could undermine their validity. To address this, we aim to scale up the number of cases through an automated approach, reducing human bias in crafting such corner cases. This motivates the development of AutoHallusion, the first automated benchmark generation approach that employs several key strategies to create a diverse range of hallucination examples. Our generated visual-question pairs pose significant challenges to LVLMs, requiring them to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

IntelligenceLab/VideoHallu
dataset· 3.1k dl
3.1k dl

Videos

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models· underline

Taxonomy

TopicsPsychedelics and Drug Studies · Epilepsy research and treatment · Cell Image Analysis Techniques