VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large   Language Models for Video Understanding

Chaoyu Li; Eun Woo Im; Pooyan Fazli

arXiv:2412.03735·cs.CV·April 2, 2025

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli

PDF

Open Access

TL;DR

This paper introduces VidHalluc, a comprehensive benchmark for evaluating hallucinations in multimodal large language models for video understanding, and proposes DINO-HEAL, a training-free method to reduce such hallucinations.

Contribution

The paper presents the largest benchmark for hallucinations in video MLLMs and a novel, training-free method to mitigate hallucinations during inference.

Findings

01

Most MLLMs are vulnerable to hallucinations in video understanding.

02

VidHalluc effectively identifies hallucination-prone cases across action, temporal sequence, and scene transition.

03

DINO-HEAL improves hallucination mitigation by an average of 3.02%. in tests.

Abstract

Multimodal large language models (MLLMs) have recently shown significant advancements in video understanding, excelling in content reasoning and instruction-following tasks. However, hallucination, where models generate inaccurate or misleading content, remains underexplored in the video domain. Building on the observation that MLLM visual encoders often fail to distinguish visually different yet semantically similar video pairs, we introduce VidHalluc, the largest benchmark designed to examine hallucinations in MLLMs for video understanding. It consists of 5,002 videos, paired to highlight cases prone to hallucinations. VidHalluc assesses hallucinations across three critical dimensions: (1) action, (2) temporal sequence, and (3) scene transition. Comprehensive testing shows that most MLLMs are vulnerable to hallucinations across these dimensions. Furthermore, we propose DINO-HEAL, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Anomaly Detection Techniques and Applications