Learning Spatio-Temporal Features with 3D Residual Networks for Action   Recognition

Kensho Hara; Hirokatsu Kataoka; Yutaka Satoh

arXiv:1708.07632·cs.CV·August 28, 2017·79 cites

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

PDF

Open Access 1 Repo

TL;DR

This paper introduces 3D Residual Networks (ResNets) for action recognition in videos, demonstrating improved performance and reduced overfitting compared to shallower 3D CNNs, validated on ActivityNet and Kinetics datasets.

Contribution

The paper proposes a novel 3D ResNet architecture for video action recognition, enabling deeper networks that outperform shallow models like C3D without overfitting.

Findings

01

3D ResNets outperform shallow 3D CNNs like C3D.

02

Training on Kinetics reduces overfitting despite large model size.

03

Achieved state-of-the-art results on ActivityNet and Kinetics datasets.

Abstract

Convolutional neural networks with spatio-temporal 3D kernels (3D CNNs) have an ability to directly extract spatio-temporal features from videos for action recognition. Although the 3D kernels tend to overfit because of a large number of their parameters, the 3D CNNs are greatly improved by using recent huge video databases. However, the architecture of 3D CNNs is relatively shallow against to the success of very deep neural networks in 2D-based CNNs, such as residual networks (ResNets). In this paper, we propose a 3D CNNs based on ResNets toward a better action representation. We describe the training procedure of our 3D ResNets in details. We experimentally evaluate the 3D ResNets on the ActivityNet and Kinetics datasets. The 3D ResNets trained on the Kinetics did not suffer from overfitting despite the large number of parameters of the model, and achieved better performance than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kenshohara/3D-ResNets
torchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis