Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Alexandros Stergiou; Ronald Poppe

arXiv:1909.13474·cs.CV·June 24, 2020

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Alexandros Stergiou, Ronald Poppe

PDF

TL;DR

This paper introduces FAST 3D convolutions, a novel decomposition of regular 3D convolutions that improves human action recognition in videos by better capturing horizontal and vertical motion patterns, leading to enhanced accuracy.

Contribution

The paper proposes a new FAST 3D convolution block that decomposes 3D convolutions into sequential spatial and directional spatio-temporal convolutions, improving performance over traditional methods.

Findings

01

FAST 3D convolutions outperform traditional 3D convolutions on UCF-101 and HMDB-51 datasets.

02

Decomposed convolutions lead to lower validation loss and better generalization.

03

DenseNet-121 with FAST 3D convolutions achieves top performance.

Abstract

Effective processing of video input is essential for the recognition of temporally varying events such as human actions. Motivated by the often distinctive temporal characteristics of actions in either horizontal or vertical direction, we introduce a novel convolution block for CNN architectures with video input. Our proposed Fractioned Adjacent Spatial and Temporal (FAST) 3D convolutions are a natural decomposition of a regular 3D convolution. Each convolution block consist of three sequential convolution operations: a 2D spatial convolution followed by spatio-temporal convolutions in the horizontal and vertical direction, respectively. Additionally, we introduce a FAST variant that treats horizontal and vertical motion in parallel. Experiments on benchmark action recognition datasets UCF-101 and HMDB-51 with ResNet architectures demonstrate consistent increased performance of FAST 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods3D Convolution · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling