DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

Xinran Wang; Yuxuan Zhang; Xiao Zhang; Haolong Yan; Muxi Diao; Songyu Xu; Zhonghao Yan; Hongbing Li; Kongming Liang; Zhanyu Ma

arXiv:2604.05623·cs.CV·April 8, 2026

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

Xinran Wang, Yuxuan Zhang, Xiao Zhang, Haolong Yan, Muxi Diao, Songyu Xu, Zhonghao Yan, Hongbing Li, Kongming Liang, Zhanyu Ma

PDF

1 Repo 1 Datasets

TL;DR

DetailVerifyBench is a new, challenging benchmark with dense annotations for evaluating hallucination localization in long image captions, covering diverse domains and extensive contexts.

Contribution

It introduces the first comprehensive benchmark with token-level hallucination annotations for long image captioning across multiple domains.

Findings

01

Benchmark contains 1,000 images with over 200-word captions.

02

Dense, token-level annotations of hallucination types are provided.

03

Available at https://zyx-hhnkh.github.io/DetailVerifyBench/.

Abstract

Accurately detecting and localizing hallucinations is a critical task for ensuring high reliability of image captions. In the era of Multimodal Large Language Models (MLLMs), captions have evolved from brief sentences into comprehensive narratives, often spanning hundreds of words. This shift exponentially increases the challenge: models must now pinpoint specific erroneous spans or words within extensive contexts, rather than merely flag response-level inconsistencies. However, existing benchmarks lack the fine granularity and domain diversity required to evaluate this capability. To bridge this gap, we introduce DetailVerifyBench, a rigorous benchmark comprising 1,000 high-quality images across five distinct domains. With an average caption length of over 200 words and dense, token-level annotations of multiple hallucination types, it stands as the most challenging benchmark for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zyx-hhnkh.github.io/DetailVerifyBench
github

Datasets

zyxhhnkh/DetailVerifyBench
dataset· 223 dl
223 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.