Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling

Md. Rashid Shahriar Khan; Md. Abrar Hasan; Mohammod Tareq Aziz Justice

arXiv:2508.18463·cs.CV·August 28, 2025

Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling

Md. Rashid Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice

PDF

TL;DR

This paper presents a novel zero-shot anomaly detection framework for surveillance videos that combines spatiotemporal modeling and semantic understanding, enabling detection of unseen anomalies without prior examples.

Contribution

It introduces a hybrid architecture integrating TimeSformer, DPC, and CLIP for context-aware, zero-shot anomaly detection in complex surveillance environments.

Findings

01

Effective detection of unseen anomalies demonstrated

02

Outperforms existing zero-shot methods

03

Combines temporal and semantic modeling successfully

Abstract

Detecting anomalies in surveillance footage is inherently challenging due to their unpredictable and context-dependent nature. This work introduces a novel context-aware zero-shot anomaly detection framework that identifies abnormal events without exposure to anomaly examples during training. The proposed hybrid architecture combines TimeSformer, DPC, and CLIP to model spatiotemporal dynamics and semantic context. TimeSformer serves as the vision backbone to extract rich spatial-temporal features, while DPC forecasts future representations to identify temporal deviations. Furthermore, a CLIP-based semantic stream enables concept-level anomaly detection through context-specific text prompts. These components are jointly trained using InfoNCE and CPC losses, aligning visual inputs with their temporal and semantic representations. A context-gating mechanism further enhances decision-making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.