Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Yuansheng Gao; Jinman Zhao; Tong Zhang; Xingguo Xu; Han Bao; Zonghui Wang; Wenzhi Chen

arXiv:2601.22574·cs.CV·February 2, 2026

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Yuansheng Gao, Jinman Zhao, Tong Zhang, Xingguo Xu, Han Bao, Zonghui Wang, Wenzhi Chen

PDF

Open Access

TL;DR

This paper introduces a novel decoding method for Video Large Language Models that reduces hallucinations by contrasting disrupted spatiotemporal and semantic features, improving factual accuracy without sacrificing understanding.

Contribution

The paper presents Spatiotemporal-Semantic Contrastive Decoding, a new approach that explicitly targets hallucination root causes by disrupting and contrasting video features during decoding.

Findings

01

Significantly reduces hallucinations in video LLM outputs

02

Maintains core video understanding and reasoning capabilities

03

Demonstrates robustness across complex scenarios

Abstract

Although Video Large Language Models perform remarkably well across tasks such as video understanding, question answering, and reasoning, they still suffer from the problem of hallucination, which refers to generating outputs that are inconsistent with explicit video content or factual evidence. However, existing decoding methods for mitigating video hallucinations, while considering the spatiotemporal characteristics of videos, mostly rely on heuristic designs. As a result, they fail to precisely capture the root causes of hallucinations and their fine-grained temporal and semantic correlations, leading to limited robustness and generalization in complex scenarios. To more effectively mitigate video hallucinations, we propose a novel decoding strategy termed Spatiotemporal-Semantic Contrastive Decoding. This strategy constructs negative features by deliberately disrupting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices