DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection
Dezhi An, Wenqiang Liu, Kefan Wang, Zening Chen, Jun Lu, Shengcai Zhang

TL;DR
The paper introduces DAMS, a dual-branch multiscale framework that combines hierarchical spatio-temporal feature learning with contrastive language-visual pre-training to improve video anomaly detection accuracy.
Contribution
It proposes a novel dual-path architecture integrating multilevel feature decoupling, adaptive multiscale temporal modeling, and cross-modal semantic alignment for enhanced anomaly detection.
Findings
Achieves state-of-the-art results on UCF-Crime and XD-Violence datasets.
Effectively models multiscale temporal dependencies and high-level semantics.
Demonstrates the benefit of combining hierarchical features with contrastive language-visual pre-training.
Abstract
The goal of video anomaly detection is tantamount to performing spatio-temporal localization of abnormal events in the video. The multiscale temporal dependencies, visual-semantic heterogeneity, and the scarcity of labeled data exhibited by video anomalies collectively present a challenging research problem in computer vision. This study offers a dual-path architecture called the Dual-Branch Adaptive Multiscale Spatiotemporal Framework (DAMS), which is based on multilevel feature decoupling and fusion, enabling efficient anomaly detection modeling by integrating hierarchical feature learning and complementary information. The main processing path of this framework integrates the Adaptive Multiscale Time Pyramid Network (AMTPN) with the Convolutional Block Attention Mechanism (CBAM). AMTPN enables multigrained representation and dynamically weighted reconstruction of temporal features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
