Gate-Shift Networks for Video Action Recognition

Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz

arXiv:1912.00381·cs.CV·March 24, 2020

Gate-Shift Networks for Video Action Recognition

Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

PDF

2 Repos 1 Video

TL;DR

This paper introduces Gate-Shift Module (GSM), a lightweight component that enhances 2D CNNs for video action recognition by enabling adaptive spatio-temporal feature routing, achieving state-of-the-art results with minimal additional complexity.

Contribution

The paper proposes GSM, a novel, efficient module that improves 2D CNNs for video recognition by incorporating spatial gating for better spatio-temporal feature learning.

Findings

01

Achieves state-of-the-art on Something Something-V1 and Diving48 datasets.

02

Obtains competitive results on EPIC-Kitchens with less model complexity.

03

GSM introduces minimal additional parameters and computational overhead.

Abstract

Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. We implement this concept with Gate-Shift Module (GSM). GSM is lightweight and turns a 2D-CNN into a highly efficient spatio-temporal feature extractor. With GSM plugged in, a 2D-CNN learns to adaptively route features through time and combine them, at almost no additional parameters and computational overhead. We perform an extensive evaluation of the proposed module to study its effectiveness in video action recognition, achieving state-of-the-art results on Something Something-V1 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Gate-Shift Networks for Video Action Recognition· youtube