Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad, Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool

TL;DR
This paper introduces a novel 3D CNN architecture with a new temporal layer for improved video classification and proposes a transfer learning method from 2D to 3D CNNs, achieving state-of-the-art results.
Contribution
The paper presents a new temporal layer for 3D CNNs and a transfer learning technique from 2D to 3D CNNs, enhancing training efficiency and performance.
Findings
T3D outperforms current state-of-the-art methods on HMDB51, UCF101, and Kinetics datasets.
Transfer learning from 2D to 3D CNNs reduces training data requirements.
The proposed methods improve accuracy over existing approaches.
Abstract
The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis
MethodsConvolution
