Unsupervised Representation Learning by Sorting Sequences

Hsin-Ying Lee; Jia-Bin Huang; Maneesh Singh; Ming-Hsuan Yang

arXiv:1708.01246·cs.CV·August 4, 2017·27 cites

Unsupervised Representation Learning by Sorting Sequences

Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised method for learning visual representations by training a neural network to sort shuffled video frames, leveraging temporal coherence as a supervisory signal.

Contribution

It proposes a novel sequence sorting task for unsupervised learning from videos, enabling the extraction of rich, generalizable visual features without labeled data.

Findings

01

Outperforms state-of-the-art methods on action recognition

02

Effective pre-training improves image classification accuracy

03

Enhances object detection performance

Abstract

We present an unsupervised representation learning approach using videos without semantic labels. We leverage the temporal coherence as a supervisory signal by formulating representation learning as a sequence sorting task. We take temporally shuffled frames (i.e., in non-chronological order) as inputs and train a convolutional neural network to sort the shuffled sequences. Similar to comparison-based sorting algorithms, we propose to extract features from all frame pairs and aggregate them to predict the correct order. As sorting shuffled image sequence requires an understanding of the statistical temporal structure of images, training with such a proxy task allows us to learn rich and generalizable visual representation. We validate the effectiveness of the learned representation using our method as pre-training on high-level recognition problems. The experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HsinYingLee/OPN
caffe2Official

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques