Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video   Action Recognition

Unaiza Ahsan; Rishi Madhok; Irfan Essa

arXiv:1808.07507·cs.CV·August 24, 2018·5 cites

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

Unaiza Ahsan, Rishi Madhok, Irfan Essa

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning method that combines spatial and temporal context in videos by solving jigsaw puzzles across multiple frames, enabling effective action recognition without heavy preprocessing.

Contribution

It presents a novel joint spatial-temporal self-supervised framework using a new permutation strategy for video understanding tasks.

Findings

01

Achieves strong performance on benchmark datasets.

02

No labeled data needed for pretraining.

03

Outperforms previous methods in unsupervised video recognition.

Abstract

We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition. Recent self-supervised approaches have used spatial context [9, 34] as well as temporal coherency [32] but a combination of the two requires extensive preprocessing such as tracking objects through millions of video frames [59] or computing optical flow to determine frame regions with high motion [30]. We propose to combine spatial and temporal context in one self-supervised framework without any heavy preprocessing. We divide multiple video frames into grids of patches and train a network to solve jigsaw puzzles on these patches from multiple frames. So the network is trained to correctly identify the position of a patch within a video frame as well as the position of a patch over time. We also propose a novel permutation strategy that outperforms random permutations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsJigsaw