Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao, Ni, Haifeng Chen

TL;DR
This paper introduces CT-D2GAN, a novel unsupervised video anomaly detection method using convolutional transformers and dual discriminators to effectively model spatio-temporal patterns and detect anomalies in surveillance videos.
Contribution
The paper proposes a convolutional transformer architecture combined with dual discriminators for improved unsupervised video anomaly detection, explicitly capturing local and global temporal coherence.
Findings
Effective anomaly detection on UCSD Ped2, CUHK Avenue, Shanghai Tech datasets.
Outperforms existing methods in accuracy and robustness.
Demonstrates the importance of local and global spatio-temporal modeling.
Abstract
Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Byte Pair Encoding · Label Smoothing · Residual Connection · Dense Connections
