HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection
Zhaolin Cai, Fan Li, Ziwei Zheng, Haixia Bi, Lijun He

TL;DR
HeadHunt-VAD introduces a tuning-free video anomaly detection method that directly identifies and utilizes robust attention heads within multimodal large language models, achieving state-of-the-art results efficiently.
Contribution
The paper proposes a novel approach to VAD by directly hunting anomaly-sensitive attention heads in MLLMs, bypassing textual outputs and prompt sensitivity issues.
Findings
Achieves state-of-the-art performance among tuning-free methods.
Maintains high efficiency in anomaly detection.
Provides interpretable outputs through head-level probing.
Abstract
Video Anomaly Detection (VAD) aims to locate events that deviate from normal patterns in videos. Traditional approaches often rely on extensive labeled data and incur high computational costs. Recent tuning-free methods based on Multimodal Large Language Models (MLLMs) offer a promising alternative by leveraging their rich world knowledge. However, these methods typically rely on textual outputs, which introduces information loss, exhibits normalcy bias, and suffers from prompt sensitivity, making them insufficient for capturing subtle anomalous cues. To address these constraints, we propose HeadHunt-VAD, a novel tuning-free VAD paradigm that bypasses textual generation by directly hunting robust anomaly-sensitive internal attention heads within the frozen MLLM. Central to our method is a Robust Head Identification module that systematically evaluates all attention heads using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization
