Human Action Recognition using Factorized Spatio-Temporal Convolutional   Networks

Lin Sun; Kui Jia; Dit-Yan Yeung; Bertram E. Shi

arXiv:1510.00562·cs.CV·October 5, 2015·112 cites

Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks

Lin Sun, Kui Jia, Dit-Yan Yeung, Bertram E. Shi

PDF

Open Access

TL;DR

This paper introduces a factorized spatio-temporal convolutional network (FstCN) for human action recognition in videos, effectively capturing 3D signals by separating spatial and temporal learning, leading to improved performance on benchmark datasets.

Contribution

The paper proposes a novel factorized 3D CNN architecture with a transformation operator, enhancing training efficiency and accuracy in human action recognition tasks.

Findings

01

FstCN outperforms existing CNN-based methods on UCF-101 and HMDB-51 datasets.

02

FstCN achieves comparable results to methods using auxiliary training videos.

03

The approach effectively handles sequence alignment through a new sampling strategy.

Abstract

Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexity of training 3D convolution kernels and the need for large quantities of training videos, only limited success has been reported. This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. Specifically, we propose factorized spatio-temporal convolutional networks (FstCN) that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers (called spatial convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

Methods3D Convolution · Convolution