Delving into CLIP latent space for Video Anomaly Recognition
Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi,, Yiming Wang, Elisa Ricci

TL;DR
This paper presents AnomalyCLIP, a novel method combining CLIP's latent space manipulation with multiple instance learning and a Transformer architecture to improve frame-level video anomaly detection and classification using only video-level supervision.
Contribution
It introduces a new approach that leverages CLIP's latent space and a Transformer model for effective anomaly detection and classification in videos, outperforming existing methods.
Findings
Outperforms state-of-the-art methods on major benchmarks
Effectively identifies abnormal frames using text-driven directions
Utilizes a computationally efficient Transformer for temporal modeling
Abstract
We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace, which in turn allows us to effectively learn text-driven directions for abnormal events. When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class. We also introduce a computationally efficient Transformer architecture to model short- and long-term temporal dependencies between frames, ultimately producing the final anomaly score and class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Artificial Immune Systems Applications
MethodsMulti-Head Attention · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Contrastive Language-Image Pre-training · Attention Is All You Need · Adam · Residual Connection · Layer Normalization
