VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Wey Yeh Choong; Yangyang Guo; Mohan Kankanhalli

arXiv:2411.16771·cs.CV·April 24, 2026·2 cites

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli

PDF

TL;DR

VidHal is a new benchmark designed to evaluate and analyze video-based hallucinations in Vision Large Language Models, highlighting their limitations and guiding future improvements.

Contribution

The paper introduces VidHal, a novel benchmark with a caption ordering task to assess hallucinations in VLLMs on videos, addressing limitations of existing evaluation methods.

Findings

01

Existing VLLMs show significant hallucination issues on videos.

02

VidHal reveals models' limitations in handling spatiotemporal information.

03

Benchmark encourages development of more accurate VLLMs for video understanding.

Abstract

Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucinations. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of videos. To address this, we introduce VidHal, a benchmark specially designed to evaluate video-based hallucinations in VLLMs. VidHal is constructed by bootstrapping video instances across a wide range of common temporal aspects. A defining feature of our benchmark lies in the careful creation of captions which represent varying levels of hallucination associated with each video. To enable fine-grained evaluation, we propose a novel caption ordering task requiring VLLMs to rank captions by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.