VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations
Sushant Gautam, Cise Midoglu, Vajira Thambawita, Michael A. Riegler, and P{\aa}l Halvorsen

TL;DR
VideoHEDGE introduces a novel entropy-based framework for detecting hallucinations in video-language models by semantic clustering and perturbation analysis, improving reliability estimation in video question answering.
Contribution
The paper extends entropy-based hallucination detection from images to videos using semantic clustering and spatiotemporal perturbations, with a new benchmark and open-source toolkit.
Findings
VASE achieves highest ROC-AUC in hallucination detection across models.
Embedding-based clustering matches NLI-based in performance at lower costs.
Domain fine-tuning reduces hallucinations but only modestly improves calibration.
Abstract
Hallucinations in video-capable vision-language models (Video-VLMs) remain frequent and high-confidence, while existing uncertainty metrics often fail to align with correctness. We introduce VideoHEDGE, a modular framework for hallucination detection in video question answering that extends entropy-based reliability estimation from images to temporally structured inputs. Given a video-question pair, VideoHEDGE draws a baseline answer and multiple high-temperature generations from both clean clips and photometrically and spatiotemporally perturbed variants, then clusters the resulting textual outputs into semantic hypotheses using either Natural Language Inference (NLI)-based or embedding-based methods. Cluster-level probability masses yield three reliability scores: Semantic Entropy (SE), RadFlag, and Vision-Amplified Semantic Entropy (VASE). We evaluate VideoHEDGE on the SoccerChat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
