THRONE: An Object-based Hallucination Benchmark for the Free-form   Generations of Large Vision-Language Models

Prannay Kaul; Zhizhong Li; Hao Yang; Yonatan Dukler; Ashwin; Swaminathan; C. J. Taylor; Stefano Soatto

arXiv:2405.05256·cs.CV·April 4, 2025

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Prannay Kaul, Zhizhong Li, Hao Yang, Yonatan Dukler, Ashwin, Swaminathan, C. J. Taylor, Stefano Soatto

PDF

Open Access 1 Repo

TL;DR

This paper introduces THRONE, a new benchmark and framework for quantitatively evaluating and reducing open-ended hallucinations in large vision-language models, addressing a gap in existing benchmarks.

Contribution

We propose THRONE, an object-based automatic framework for measuring Type I hallucinations in free-form LVLM outputs, and demonstrate its effectiveness and limitations.

Findings

01

Existing metrics do not correlate with hallucination reduction.

02

Type I and Type II hallucinations are often anti-correlated.

03

A simple data augmentation method reduces both hallucination types.

Abstract

Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats -- typically a multiple-choice response regarding a particular object or attribute -- which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe that a reduction in Type II hallucinations does not lead to a reduction in Type I hallucinations but rather that the two forms of hallucinations are often anti-correlated. To address this, we propose THRONE, a novel object-based automatic framework for quantitatively evaluating Type I hallucinations in LVLM free-form outputs. We use public language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoyu-bu/CAFe
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsFocus