VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models
Zefan Zhang, Kehua Zhu, Shijie Jiang, Hongyuan Lu, Shengkai Sun, Tian Bai

TL;DR
This paper introduces VERHallu, a benchmark for evaluating event relation hallucination in VideoLLMs, revealing current models' struggles with dense-event reasoning and proposing a Key-Frame Propagating strategy to improve understanding.
Contribution
The paper presents a new benchmark for event relation hallucination and a novel Key-Frame Propagating method to mitigate hallucinations in VideoLLMs.
Findings
Current models rely on prior knowledge, neglecting frame cues.
Models excel at key event grounding but miss surrounding subevents.
KFP strategy improves event relation understanding without slowing inference.
Abstract
Video Large Language Models (VideoLLMs) exhibit various types of hallucinations. Existing research has primarily focused on hallucinations involving the presence of events, objects, and scenes in videos, while largely neglecting event relation hallucination. In this paper, we introduce a novel benchmark for evaluating the Video Event Relation Hallucination, named VERHallu. This benchmark focuses on causal, temporal, and subevent relations between events, encompassing three types of tasks: relation classification, question answering, and counterfactual question answering, for a comprehensive evaluation of event relation hallucination. Additionally, it features counterintuitive video scenarios that deviate from typical pretraining distributions, with each sample accompanied by human-annotated candidates covering both vision-language and pure language biases. Our analysis reveals that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
