Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu

TL;DR
This paper introduces DAKD, a novel knowledge distillation method that aggregates multiple backbone representations for weakly-supervised video anomaly detection, achieving state-of-the-art results on several benchmarks.
Contribution
The paper proposes a bi-level distillation framework with a disentangled cross-attention network for improved anomaly detection in videos with weak supervision.
Findings
Achieves 1.36% improvement on UCF-Crime
Achieves 0.78% improvement on ShanghaiTech
Achieves 7.02% improvement on XD-Violence
Abstract
Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos. The benchmark setup for this task is extremely challenging due to: i) the limited size of the training sets, ii) weak supervision provided in terms of video-level labels, and iii) intrinsic class imbalance induced by the scarcity of abnormal events. In this work, we show that distilling knowledge from aggregated representations of multiple backbones into a single-backbone Student model achieves state-of-the-art performance. In particular, we develop a bi-level distillation approach along with a novel disentangled cross-attention-based feature aggregation network. Our proposed approach, DAKD (Distilling Aggregated Knowledge with Disentangled Attention), demonstrates superior performance compared to existing methods across multiple benchmark datasets. Notably, we achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Artificial Immune Systems Applications
