BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in   Vision-language Models

Moon Ye-Bin; Nam Hyeon-Woo; Wonseok Choi; Tae-Hyun Oh

arXiv:2407.13442·cs.CV·July 19, 2024

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Tae-Hyun Oh

PDF

Open Access

TL;DR

This paper introduces BEAF, a new dataset and metrics to evaluate hallucination in vision-language models by assessing their understanding of scene changes through image editing and scene manipulation.

Contribution

We propose BEAF, a novel benchmark with metrics based on scene changes to better evaluate hallucination in vision-language models.

Findings

01

VLMs show varied hallucination behaviors across different metrics

02

Our metrics reveal aspects of hallucination not previously reported

03

BEAF effectively assesses scene understanding in VLMs

Abstract

Vision language models (VLMs) perceive the world through a combination of a visual encoder and a large language model (LLM). The visual encoder, pre-trained on large-scale vision-text datasets, provides zero-shot generalization to visual data, and the LLM endows its high reasoning ability to VLMs. It leads VLMs to achieve high performance on wide benchmarks without fine-tuning, exhibiting zero or few-shot capability. However, recent studies show that VLMs are vulnerable to hallucination. This undesirable behavior degrades reliability and credibility, thereby making users unable to fully trust the output from VLMs. To enhance trustworthiness and better tackle the hallucination of VLMs, we curate a new evaluation dataset, called the BEfore-AFter hallucination dataset (BEAF), and introduce new metrics: True Understanding (TU), IGnorance (IG), StuBbornness (SB), and InDecision (ID). Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychedelics and Drug Studies · Epilepsy research and treatment · Schizophrenia research and treatment

MethodsFocus