Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection
Joo-Yeon Lee, Woo-Jeoung Nam, Seong-Whan Lee

TL;DR
This paper introduces a transformer-based model with multi-contextual prediction streams for video anomaly detection, effectively capturing spatio-temporal context to distinguish normal from abnormal events.
Contribution
The paper proposes a novel transformer architecture with three contextual prediction streams to improve understanding of spatio-temporal context in video anomaly detection.
Findings
Achieves competitive results on benchmark datasets
Effectively learns normality patterns via multi-contextual predictions
Outperforms some existing methods in anomaly detection accuracy
Abstract
Video Anomaly Detection(VAD) has been traditionally tackled in two main methodologies: the reconstruction-based approach and the prediction-based one. As the reconstruction-based methods learn to generalize the input image, the model merely learns an identity function and strongly causes the problem called generalizing issue. On the other hand, since the prediction-based ones learn to predict a future frame given several previous frames, they are less sensitive to the generalizing issue. However, it is still uncertain if the model can learn the spatio-temporal context of a video. Our intuition is that the understanding of the spatio-temporal context of a video plays a vital role in VAD as it provides precise information on how the appearance of an event in a video clip changes. Hence, to fully exploit the context information for anomaly detection in video circumstances, we designed the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Artificial Immune Systems Applications
MethodsContrastive Language-Image Pre-training
