SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding
Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank Wang

TL;DR
SEASON is a training-free method that reduces temporal and spatial hallucinations in VideoLLMs by dynamically diagnosing and contrastively decoding tokens, significantly improving their factual consistency and understanding accuracy.
Contribution
It introduces Self-Diagnostic Contrastive Decoding (SEASON), a novel approach that enhances temporal and spatial faithfulness in VideoLLMs without additional training.
Findings
Outperforms existing training-free hallucination mitigation methods.
Improves performance across multiple video understanding benchmarks.
Effectively reduces temporal and spatial hallucinations in VideoLLMs.
Abstract
Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in video understanding remains relatively underexplored. To address this issue, we propose Self-Diagnostic Contrastive Decoding (SEASON), a training-free method that adaptively enhances temporal and spatial faithfulness for each output token. It achieves this by dynamically diagnosing each token's hallucination tendency and applying adaptive contrastive decoding against its corresponding temporal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Human Pose and Action Recognition
