Self-supervised learning using consistency regularization of   spatio-temporal data augmentation for action recognition

Jinpeng Wang; Yiqi Lin; Andy J.Ma

arXiv:2008.02086·cs.CV·August 6, 2020

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

Jinpeng Wang, Yiqi Lin, Andy J.Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised learning approach for action recognition that leverages spatio-temporal consistency regularization and specialized data augmentations, significantly improving performance over existing methods.

Contribution

It proposes a new consistency regularization framework using high-level feature maps and develops two video-specific data augmentation techniques for better action feature extraction.

Findings

01

Achieves 22% relative improvement on HMDB51

02

Achieves 7% relative improvement on UCF101

03

Outperforms state-of-the-art self-supervised methods

Abstract

Self-supervised learning has shown great potentials in improving the deep learning model in an unsupervised manner by constructing surrogate supervision signals directly from the unlabeled data. Different from existing works, we present a novel way to obtain the surrogate supervision signal based on high-level feature maps under consistency regularization. In this paper, we propose a Spatio-Temporal Consistency Regularization between different output features generated from a siamese network including a clean path fed with original video and a noise path fed with the corresponding augmented video. Based on the Spatio-Temporal characteristics of video, we develop two video-based data augmentation methods, i.e., Spatio-Temporal Transformation and Intra-Video Mixup. Consistency of the former one is proposed to model transformation consistency of features, while the latter one aims at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FingerRec/Self-Supervised-Temporal-Discriminative-Representation-Learning-for-Video-Action-Recognition
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems

MethodsMixup · Siamese Network