HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal   Reasoning

Zhecan Wang; Garrett Bingham; Adams Yu; Quoc Le; Thang Luong; Golnaz; Ghiasi

arXiv:2407.15680·cs.CV·July 23, 2024

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Zhecan Wang, Garrett Bingham, Adams Yu, Quoc Le, Thang Luong, Golnaz, Ghiasi

PDF

Open Access 1 Repo 1 Datasets

TL;DR

HaloQuest is a new multimodal hallucination dataset that uses synthetic and real images to evaluate and improve vision-language models, revealing current models' struggles and proposing new evaluation methods.

Contribution

Introduces HaloQuest, a large-scale dataset with synthetic images for benchmarking and fine-tuning VLMs to reduce hallucination in multimodal reasoning.

Findings

01

Current VLMs achieve below 36% accuracy on HaloQuest.

02

Fine-tuning on HaloQuest reduces hallucination without harming standard reasoning.

03

Generated images correlate highly (r=0.97) with real images in benchmarking.

Abstract

Hallucination has been a major problem for large language models and remains a critical challenge when it comes to multimodality in which vision-language models (VLMs) have to deal with not just textual but also visual inputs. Despite rapid progress in VLMs, resources for evaluating and addressing multimodal hallucination are limited and mostly focused on evaluation. This work introduces HaloQuest, a novel visual question answering dataset that captures various aspects of multimodal hallucination such as false premises, insufficient contexts, and visual challenges. A novel idea from HaloQuest is to leverage synthetic images, apart from real ones, to enable dataset creation at scale. With over 7.7K examples spanning across a wide variety of categories, HaloQuest was designed to be both a challenging benchmark for VLMs and a fine-tuning dataset for advancing multimodal reasoning. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/haloquest
noneOfficial

Datasets

johko/HaloQuest
dataset· 49 dl
49 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics