Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
Zhiwei Yang, Jing Liu, Peng Wu

TL;DR
This paper introduces a novel framework using text prompts and normality guidance with CLIP for weakly supervised video anomaly detection, significantly improving pseudo-label accuracy and temporal modeling.
Contribution
It proposes a new pseudo-label generation and self-training framework leveraging CLIP with a learnable text prompt and normality guidance for better video anomaly detection.
Findings
Achieves state-of-the-art results on UCF-Crime and XD-Violence datasets.
Improves pseudo-label accuracy through text-visual alignment.
Enhances temporal dependency learning with self-adaptive modules.
Abstract
Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However, since the existing methods use only RGB visual modality and the utilization of category text information is neglected, thus limiting the generation of more accurate pseudo-labels and affecting the performance of self-training. Inspired by the manual labeling process based on the event description, in this paper, we propose a novel pseudo-label generation and self-training framework based on Text Prompt with Normality Guidance (TPWNG) for WSVAD. Our idea is to transfer the rich language-visual knowledge of the contrastive language-image pre-training (CLIP) model for aligning the video event description text and corresponding video frames to generate pseudo-labels. Specifically, We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Artificial Immune Systems Applications
MethodsContrastive Language-Image Pre-training
