SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere
Chao Huang, Penfei Wei, Wei Wang, Jie Wen, Zhihua Wang, Li Shen, Wenqi Ren, Xiaochun Cao

TL;DR
SphereVAD introduces a training-free, geometric inference-based framework for video anomaly detection that leverages pre-trained multimodal models and spherical geometry to identify anomalies without task-specific training.
Contribution
It proposes a novel zero-shot VAD method using geodesic inference on the hypersphere, eliminating the need for large-scale annotations or training.
Findings
Achieves state-of-the-art results among training-free methods on major benchmarks.
Remains competitive with fully supervised approaches.
Requires only minimal synthetic images for calibration.
Abstract
Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos. Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes. We observe that intermediate-layer features of pre-trained multimodal large language models (MLLMs) already encode rich anomaly semantics, yet existing approaches rely on the language output pathway and fail to exploit the geometric discriminability latent in these representations. Based on this finding, we propose SphereVAD, a fully training-free, zero-shot VAD framework that recasts anomaly discrimination as von Mises-Fisher (vMF) likelihood-ratio geodesic inference on the unit hypersphere, unleashing latent discriminability through principled geometric reasoning rather than learning new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
