TransNet: A Transfer Learning-Based Network for Human Action Recognition
K. Alomar, X. Cai

TL;DR
TransNet is a novel, efficient deep learning architecture for human action recognition that leverages transfer learning by decomposing 3D-CNNs into 2D and 1D components, improving speed and accuracy.
Contribution
The paper introduces TransNet, a simple end-to-end model that decomposes 3D-CNNs into 2D and 1D CNNs, enabling effective transfer learning for HAR.
Findings
TransNet outperforms state-of-the-art models in accuracy.
TransNet reduces training time and model complexity.
TransNet demonstrates high flexibility and effectiveness.
Abstract
Human action recognition (HAR) is a high-level and significant research area in computer vision due to its ubiquitous applications. The main limitations of the current HAR models are their complex structures and lengthy training time. In this paper, we propose a simple yet versatile and effective end-to-end deep learning architecture, coined as TransNet, for HAR. TransNet decomposes the complex 3D-CNNs into 2D- and 1D-CNNs, where the 2D- and 1D-CNN components extract spatial features and temporal patterns in videos, respectively. Benefiting from its concise architecture, TransNet is ideally compatible with any pretrained state-of-the-art 2D-CNN models in other fields, being transferred to serve the HAR task. In other words, it naturally leverages the power and success of transfer learning for HAR, bringing huge advantages in terms of efficiency and effectiveness. Extensive experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
