Geometry-Aware Semantic Reasoning for Training Free Video Anomaly Detection
Ali Zia, Usman Ali, Muhammad Umer Ramzan, Hamza Abid, Abdul Rehman, and Wei Xiang

TL;DR
This paper introduces MM-VAD, a geometry-aware semantic reasoning framework for training-free video anomaly detection that improves stability and interpretability by leveraging hyperbolic space representations and adaptive inference.
Contribution
It proposes a novel geometry-aware approach that reframes anomaly detection as adaptive test-time inference using hyperbolic embeddings and semantic reasoning, without training.
Findings
Achieves 90.03% AUC on XD-Violence
Outperforms prior training-free methods on multiple benchmarks
Provides more stable and interpretable anomaly predictions
Abstract
Training-free video anomaly detection (VAD) has recently emerged as a scalable alternative to supervised approaches, yet existing methods largely rely on static prompting and geometry-agnostic feature fusion. As a result, anomaly inference is often reduced to shallow similarity matching over Euclidean embeddings, leading to unstable predictions and limited interpretability, especially in complex or hierarchically structured scenes. We introduce MM-VAD, a geometry-aware semantic reasoning framework for training free VAD that reframes anomaly detection as adaptive test-time inference rather than fixed feature comparison. Our approach projects caption-derived scene representations into hyperbolic space to better preserve hierarchical structure and performs anomaly assessment through an adaptive question answering process over a frozen large language model. A lightweight, learnable prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization
