SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse

Yiming Sun; Mi Zhang; Feifei Li; Geng Hong; Min Yang

arXiv:2512.18671·cs.CV·December 23, 2025

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse

Yiming Sun, Mi Zhang, Feifei Li, Geng Hong, Min Yang

PDF

Open Access

TL;DR

SmartSight is a training-free method that reduces hallucinations in Video-LLMs by using introspective scoring and attention analysis, improving reliability without sacrificing video understanding.

Contribution

It introduces a novel introspective approach leveraging temporal attention analysis to mitigate hallucinations without retraining the model.

Findings

01

Reduces hallucinations by 10.59% on VRIPT-HAL

02

Enhances video understanding and reasoning performance by up to 8.86%

03

Achieves this with lower decoding costs through early response termination

Abstract

Despite Video Large Language Models having rapidly advanced in recent years, perceptual hallucinations pose a substantial safety risk, which severely restricts their real-world applicability. While several methods for hallucination mitigation have been proposed, they often compromise the model's capacity for video understanding and reasoning. In this work, we propose SmartSight, a pioneering step to address this issue in a training-free manner by leveraging the model's own introspective capabilities. Specifically, SmartSight generates multiple candidate responses to uncover low-hallucinated outputs that are often obscured by standard greedy decoding. It assesses the hallucination of each response using the Temporal Attention Collapse score, which measures whether the model over-focuses on trivial temporal regions of the input video when generating the response. To improve efficiency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices · Digital Media Forensic Detection