VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models

Zefan Zhang; Kehua Zhu; Shijie Jiang; Hongyuan Lu; Shengkai Sun; Tian Bai

arXiv:2601.10010·cs.CV·January 16, 2026

VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models

Zefan Zhang, Kehua Zhu, Shijie Jiang, Hongyuan Lu, Shengkai Sun, Tian Bai

PDF

Open Access

TL;DR

This paper introduces VERHallu, a benchmark for evaluating event relation hallucination in VideoLLMs, revealing current models' struggles with dense-event reasoning and proposing a Key-Frame Propagating strategy to improve understanding.

Contribution

The paper presents a new benchmark for event relation hallucination and a novel Key-Frame Propagating method to mitigate hallucinations in VideoLLMs.

Findings

01

Current models rely on prior knowledge, neglecting frame cues.

02

Models excel at key event grounding but miss surrounding subevents.

03

KFP strategy improves event relation understanding without slowing inference.

Abstract

Video Large Language Models (VideoLLMs) exhibit various types of hallucinations. Existing research has primarily focused on hallucinations involving the presence of events, objects, and scenes in videos, while largely neglecting event relation hallucination. In this paper, we introduce a novel benchmark for evaluating the Video Event Relation Hallucination, named VERHallu. This benchmark focuses on causal, temporal, and subevent relations between events, encompassing three types of tasks: relation classification, question answering, and counterfactual question answering, for a comprehensive evaluation of event relation hallucination. Additionally, it features counterintuitive video scenarios that deviate from typical pretraining distributions, with each sample accompanied by human-annotated candidates covering both vision-language and pure language biases. Our analysis reveals that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning