Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

Jianfeng Cai; Wengang Zhou; Zongmeng Zhang; Jiale Hong; Nianji Zhan; Houqiang Li

arXiv:2505.12826·cs.CV·May 20, 2025

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

Jianfeng Cai, Wengang Zhou, Zongmeng Zhang, Jiale Hong, Nianji Zhan, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces a temporal-aware activation engineering method to reduce hallucinations in VideoLLMs by focusing on temporal variation sensitivity, achieving significant improvements without extra fine-tuning.

Contribution

It is the first to explore activation engineering for hallucination mitigation in VideoLLMs, emphasizing temporal variation as a key factor and proposing a novel adaptive framework.

Findings

01

Significantly reduces hallucinations across multiple VideoLLMs and benchmarks.

02

Identifies temporal variation as a critical factor influencing hallucination sensitivity.

03

Demonstrates robustness of the proposed method without additional fine-tuning.

Abstract

Multimodal large language models (MLLMs) have achieved remarkable progress in video understanding.However, hallucination, where the model generates plausible yet incorrect outputs, persists as a significant and under-addressed challenge in the video domain. Among existing solutions, activation engineering has proven successful in mitigating hallucinations in LLMs and ImageLLMs, yet its applicability to VideoLLMs remains largely unexplored. In this work, we are the first to systematically investigate the effectiveness and underlying mechanisms of activation engineering for mitigating hallucinations in VideoLLMs. We initially conduct an investigation of the key factors affecting the performance of activation engineering and find that a model's sensitivity to hallucination depends on $temporal variation$ rather than task type. Moreover, selecting appropriate internal modules and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Emotion and Mood Recognition · Multimodal Machine Learning Applications