Learning Representative Temporal Features for Action Recognition

Ali Javidani; Ahmad Mahmoudi-Aznaveh

arXiv:1802.06724·cs.CV·March 29, 2022

Learning Representative Temporal Features for Action Recognition

Ali Javidani, Ahmad Mahmoudi-Aznaveh

PDF

TL;DR

This paper introduces a lightweight video classification approach that extracts temporal features using a 1D-CNN on PCA-reduced features from pre-trained CNNs, enabling effective recognition with limited training data.

Contribution

The novel approach combines pre-trained spatial features with a 1D-CNN for temporal classification, reducing training complexity and data requirements.

Findings

01

Achieved state-of-the-art results on UCF11 and jHMDB datasets.

02

Performed competitively on HMDB51 dataset.

03

Reduced training parameters significantly.

Abstract

In this paper, a novel video classification method is presented that aims to recognize different categories of third-person videos efficiently. Our motivation is to achieve a light model that could be trained with insufficient training data. With this intuition, the processing of the 3-dimensional video input is broken to 1D in temporal dimension on top of the 2D in spatial. The processes related to 2D spatial frames are being done by utilizing pre-trained networks with no training phase. The only step which involves training is to classify the 1D time series resulted from the description of the 2D signals. As a matter of fact, optical flow images are first calculated from consecutive frames and described by pre-trained CNN networks. Their dimension is then reduced using PCA. By stacking the description vectors beside each other, a multi-channel time series is created for each video.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.