TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
Yousuf Ahmed Siddiqui, Sufiyaan Usmani, Umer Tariq, Jawwad Ahmed Shamsi, and Muhammad Burhan Khan

TL;DR
This paper introduces TRACES, a real-time, context-aware zero-shot video anomaly detection system that uses memory-augmented cross-attention to adaptively identify new anomalies with high accuracy and explainability.
Contribution
The work presents a novel memory-augmented, cross-attention based pipeline for real-time, zero-shot anomaly detection that significantly improves state-of-the-art performance.
Findings
Achieves 90.4% AUC on UCF-Crime
Achieves 83.67% AP on XD-Violence
Operates in real-time with high precision and explainability
Abstract
Video anomalies often depend on contextual information available and temporal evolution. Non-anomalous action in one context can be anomalous in some other context. Most anomaly detectors, however, do not notice this type of context, which seriously limits their capability to generalize to new, real-life situations. Our work addresses the context-aware zero-shot anomaly detection challenge, in which systems need to learn adaptively to detect new events by correlating temporal and appearance features with textual traces of memory in real time. Our approach defines a memory-augmented pipeline, correlating temporal signals with visual embeddings using cross-attention, and real-time zero-shot anomaly classification by contextual similarity scoring. We achieve 90.4\% AUC on UCF-Crime and 83.67\% AP on XD-Violence, a new state-of-the-art among zero-shot models. Our model achieves real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
