Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection
Jiahao Lyu, Minghua Zhao, Xuewen Huang, Yifei Chen, Shuangli Du, Jing Hu, Cheng Shi, Zhiyong Lv

TL;DR
FoGA is a lightweight, real-time video anomaly detection model that uses forward consistency learning and gated context aggregation to improve accuracy while maintaining efficiency suitable for edge devices.
Contribution
The paper introduces FoGA, a novel lightweight VAD model with a gated context aggregation module and forward consistency loss, enabling efficient and accurate anomaly detection on resource-limited devices.
Findings
Outperforms state-of-the-art methods in accuracy.
Runs up to 155 FPS on standard hardware.
Achieves a good balance between performance and efficiency.
Abstract
As a crucial element of public security, video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems. However, most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices. Moreover, mainstream prediction-based VAD detects anomalies using only single-frame future prediction errors, overlooking the richer constraints from longer-term temporal forward information. In this paper, we introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context Aggregation, containing about 2M parameters and tailored for potential edge devices. Specifically, we propose a Unet-based method that performs feature extraction on consecutive frames to generate both immediate and forward predictions. Then, we introduce a gated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization
