AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun, Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber,, Tianyi Zhou, Dinesh Manocha

TL;DR
AutoHallusion introduces an automated method to generate diverse hallucination benchmarks for vision-language models, enabling scalable evaluation and revealing common failure patterns to improve model robustness.
Contribution
It is the first automated approach to create diverse hallucination benchmarks for LVLMs, reducing human bias and enabling large-scale evaluation.
Findings
High success rate (97.7% and 98.7%) in inducing hallucinations across models.
Generated benchmarks challenge models to overcome contextual biases.
Revealed common failure patterns and reasons for hallucinations.
Abstract
Large vision-language models (LVLMs) are prone to hallucinations, where certain contextual cues in an image can trigger the language module to produce overconfident and incorrect reasoning about abnormal or hypothetical objects. While some benchmarks have been developed to investigate LVLM hallucinations, they often rely on hand-crafted corner cases whose failure patterns may not generalize well. Additionally, fine-tuning on these examples could undermine their validity. To address this, we aim to scale up the number of cases through an automated approach, reducing human bias in crafting such corner cases. This motivates the development of AutoHallusion, the first automated benchmark generation approach that employs several key strategies to create a diverse range of hallucination examples. Our generated visual-question pairs pose significant challenges to LVLMs, requiring them to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPsychedelics and Drug Studies · Epilepsy research and treatment · Cell Image Analysis Techniques
