VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations

Sushant Gautam; Cise Midoglu; Vajira Thambawita; Michael A. Riegler; and P{\aa}l Halvorsen

arXiv:2601.08557·cs.CV·January 14, 2026

VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations

Sushant Gautam, Cise Midoglu, Vajira Thambawita, Michael A. Riegler, and P{\aa}l Halvorsen

PDF

Open Access

TL;DR

VideoHEDGE introduces a novel entropy-based framework for detecting hallucinations in video-language models by semantic clustering and perturbation analysis, improving reliability estimation in video question answering.

Contribution

The paper extends entropy-based hallucination detection from images to videos using semantic clustering and spatiotemporal perturbations, with a new benchmark and open-source toolkit.

Findings

01

VASE achieves highest ROC-AUC in hallucination detection across models.

02

Embedding-based clustering matches NLI-based in performance at lower costs.

03

Domain fine-tuning reduces hallucinations but only modestly improves calibration.

Abstract

Hallucinations in video-capable vision-language models (Video-VLMs) remain frequent and high-confidence, while existing uncertainty metrics often fail to align with correctness. We introduce VideoHEDGE, a modular framework for hallucination detection in video question answering that extends entropy-based reliability estimation from images to temporally structured inputs. Given a video-question pair, VideoHEDGE draws a baseline answer and multiple high-temperature generations from both clean clips and photometrically and spatiotemporally perturbed variants, then clusters the resulting textual outputs into semantic hypotheses using either Natural Language Inference (NLI)-based or embedding-based methods. Cluster-level probability masses yield three reliability scores: Semantic Entropy (SE), RadFlag, and Vision-Amplified Semantic Entropy (VASE). We evaluate VideoHEDGE on the SoccerChat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications