Hallucination Localization in Video Captioning

Shota Nakada; Kazuhiro Saito; Yuchi Ishikawa; Hokuto Munakata; Tatsuya Komatsu; Masayoshi Kondo

arXiv:2510.25225·cs.MM·October 30, 2025

Hallucination Localization in Video Captioning

Shota Nakada, Kazuhiro Saito, Yuchi Ishikawa, Hokuto Munakata, Tatsuya Komatsu, Masayoshi Kondo

PDF

TL;DR

This paper introduces a new task of hallucination localization in video captioning, providing a detailed span-level analysis and a benchmark dataset to evaluate current models' ability to identify hallucinations.

Contribution

It proposes the first span-level hallucination localization task, creates the HLVC-Dataset, and benchmarks existing methods for this new problem.

Findings

01

Baseline methods show room for improvement in hallucination localization.

02

HLVC-Dataset enables detailed evaluation of hallucination detection.

03

Quantitative and qualitative analyses highlight current challenges.

Abstract

We propose a novel task, hallucination localization in video captioning, which aims to identify hallucinations in video captions at the span level (i.e. individual words or phrases). This allows for a more detailed analysis of hallucinations compared to existing sentence-level hallucination detection task. To establish a benchmark for hallucination localization, we construct HLVC-Dataset, a carefully curated dataset created by manually annotating 1,167 video-caption pairs from VideoLLM-generated captions. We further implement a VideoLLM-based baseline method and conduct quantitative and qualitative evaluations to benchmark current performance on hallucination localization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.