Shuffle and Learn: Unsupervised Learning using Temporal Order   Verification

Ishan Misra; C. Lawrence Zitnick; Martial Hebert

arXiv:1603.08561·cs.CV·July 27, 2016·20 cites

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

Ishan Misra, C. Lawrence Zitnick, Martial Hebert

PDF

Open Access 1 Video

TL;DR

This paper introduces an unsupervised method for learning visual representations from videos by verifying the correct temporal order of frames, which improves action recognition and pose estimation without relying on semantic labels.

Contribution

The authors propose a novel unsupervised sequential verification task using CNNs to learn visual features from videos, capturing temporally varying information without semantic labels.

Findings

01

Improves action recognition accuracy on UCF101 and HMDB51 datasets.

02

Achieves competitive pose estimation results on FLIC and MPII datasets.

03

Provides complementary features to supervised learning methods.

Abstract

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful visual representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

#55 Dr. ISHAN MISRA - Self-Supervised Vision Models· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Digital Imaging for Blood Diseases