A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos
Xianlin Zeng, Yalong Jiang, Wenrui Ding, Hongguang Li, Yafeng Hao,, Zifeng Qiu

TL;DR
This paper introduces a hierarchical spatio-temporal graph convolutional neural network that captures multi-level interactions and scene understanding for improved anomaly detection in surveillance videos.
Contribution
It proposes a novel HSTGCNN model with multi-level graph representations and adaptive weighting for scene-specific anomaly detection, outperforming existing methods.
Findings
Significantly outperforms state-of-the-art models on four benchmark datasets.
Uses fewer learnable parameters than comparable models.
Effectively encodes both individual movements and interactions among identities.
Abstract
Deep learning models have been widely used for anomaly detection in surveillance videos. Typical models are equipped with the capability to reconstruct normal videos and evaluate the reconstruction errors on anomalous videos to indicate the extent of abnormalities. However, existing approaches suffer from two disadvantages. Firstly, they can only encode the movements of each identity independently, without considering the interactions among identities which may also indicate anomalies. Secondly, they leverage inflexible models whose structures are fixed under different scenes, this configuration disables the understanding of scenes. In this paper, we propose a Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to address these problems, the HSTGCNN is composed of multiple branches that correspond to different levels of graph representations. High-level graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
