Gate-Shift-Fuse for Video Action Recognition

Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz

arXiv:2203.08897·cs.CV·April 18, 2023

Gate-Shift-Fuse for Video Action Recognition

Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

PDF

Open Access 2 Repos

TL;DR

This paper introduces Gate-Shift-Fuse (GSF), a novel module that enhances 2D CNNs for video action recognition by adaptively modeling spatio-temporal features with minimal overhead, achieving state-of-the-art results.

Contribution

The paper proposes GSF, a data-driven, learnable spatio-temporal feature extraction module that can be integrated into 2D CNNs to improve video action recognition performance.

Findings

01

Achieves state-of-the-art results on five benchmarks.

02

Increases spatio-temporal modeling capacity with negligible overhead.

03

Demonstrates compatibility with popular 2D CNN architectures.

Abstract

Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Gait Recognition and Analysis