CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge

Gabriele Lagani; Fabrizio Falchi; Claudio Gennaro; Giuseppe Amato

arXiv:2505.19928·cs.CV·May 27, 2025

CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge

Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

PDF

Open Access

TL;DR

This paper presents CA3D, a novel deep learning model combining convolutional layers and attention mechanisms, optimized for efficient video activity recognition on edge devices, balancing accuracy and computational cost.

Contribution

Introduction of CA3D, a convolutional-attentional 3D network with a new quantization method for efficient, accurate video activity recognition on resource-constrained devices.

Findings

01

Achieves competitive accuracy on benchmark datasets.

02

Reduces computational cost compared to existing models.

03

Maintains robust learning and generalization capabilities.

Abstract

In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Context-Aware Activity Recognition Systems

MethodsSoftmax · Attention Is All You Need