Temporal 3D ConvNets: New Architecture and Transfer Learning for Video   Classification

Ali Diba; Mohsen Fayyaz; Vivek Sharma; Amir Hossein Karami; Mohammad; Mahdi Arzani; Rahman Yousefzadeh; Luc Van Gool

arXiv:1711.08200·cs.CV·November 23, 2017·187 cites

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad, Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel 3D CNN architecture with a new temporal layer for improved video classification and proposes a transfer learning method from 2D to 3D CNNs, achieving state-of-the-art results.

Contribution

The paper presents a new temporal layer for 3D CNNs and a transfer learning technique from 2D to 3D CNNs, enhancing training efficiency and performance.

Findings

01

T3D outperforms current state-of-the-art methods on HMDB51, UCF101, and Kinetics datasets.

02

Transfer learning from 2D to 3D CNNs reduces training data requirements.

03

The proposed methods improve accuracy over existing approaches.

Abstract

The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis

MethodsConvolution