CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

TL;DR
This paper presents CA3D, a novel deep learning model combining convolutional layers and attention mechanisms, optimized for efficient video activity recognition on edge devices, balancing accuracy and computational cost.
Contribution
Introduction of CA3D, a convolutional-attentional 3D network with a new quantization method for efficient, accurate video activity recognition on resource-constrained devices.
Findings
Achieves competitive accuracy on benchmark datasets.
Reduces computational cost compared to existing models.
Maintains robust learning and generalization capabilities.
Abstract
In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Context-Aware Activity Recognition Systems
MethodsSoftmax · Attention Is All You Need
