TL;DR
ACDnet is a real-time, resource-efficient action detection network for edge devices that leverages flow-guided feature approximation and memory aggregation to maintain high accuracy and speed.
Contribution
The paper introduces ACDnet, a novel compact action detection network that efficiently exploits temporal coherence and memory aggregation for real-time edge computing.
Findings
Achieves over 75 FPS on benchmark datasets.
Maintains competitive accuracy with heavier models.
Demonstrates robustness in real-world scenarios.
Abstract
Interpreting human actions requires understanding the spatial and temporal context of the scenes. State-of-the-art action detectors based on Convolutional Neural Network (CNN) have demonstrated remarkable results by adopting two-stream or 3D CNN architectures. However, these methods typically operate in a non-real-time, ofline fashion due to system complexity to reason spatio-temporal information. Consequently, their high computational cost is not compliant with emerging real-world scenarios such as service robots or public surveillance where detection needs to take place at resource-limited edge devices. In this paper, we propose ACDnet, a compact action detection network targeting real-time edge computing which addresses both efficiency and accuracy. It intelligently exploits the temporal coherence between successive video frames to approximate their CNN features rather than naively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james · 3 Dimensional Convolutional Neural Network · Non Maximum Suppression · Convolution · 1x1 Convolution · SSD
