Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling
Md. Rashid Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice

TL;DR
This paper presents a novel zero-shot anomaly detection framework for surveillance videos that combines spatiotemporal modeling and semantic understanding, enabling detection of unseen anomalies without prior examples.
Contribution
It introduces a hybrid architecture integrating TimeSformer, DPC, and CLIP for context-aware, zero-shot anomaly detection in complex surveillance environments.
Findings
Effective detection of unseen anomalies demonstrated
Outperforms existing zero-shot methods
Combines temporal and semantic modeling successfully
Abstract
Detecting anomalies in surveillance footage is inherently challenging due to their unpredictable and context-dependent nature. This work introduces a novel context-aware zero-shot anomaly detection framework that identifies abnormal events without exposure to anomaly examples during training. The proposed hybrid architecture combines TimeSformer, DPC, and CLIP to model spatiotemporal dynamics and semantic context. TimeSformer serves as the vision backbone to extract rich spatial-temporal features, while DPC forecasts future representations to identify temporal deviations. Furthermore, a CLIP-based semantic stream enables concept-level anomaly detection through context-specific text prompts. These components are jointly trained using InfoNCE and CPC losses, aligning visual inputs with their temporal and semantic representations. A context-gating mechanism further enhances decision-making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
