Self-supervised learning of class embeddings from video

Olivia Wiles; A. Sophia Koepke; Andrew Zisserman

arXiv:1910.12699·cs.CV·October 29, 2019

Self-supervised learning of class embeddings from video

Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

PDF

TL;DR

This paper presents a self-supervised learning approach for video-based class embeddings that encode pose and shape, enabling effective downstream tasks and achieving state-of-the-art results without supervision.

Contribution

Introduces a hierarchical probabilistic decoder for learning class-specific embeddings from videos, generalizing across deformable object classes and outperforming existing self-supervised methods.

Findings

01

Achieves state-of-the-art performance on multiple deformable object classes.

02

Embeddings generalize well across different domains.

03

Approaches supervised performance levels without using labels.

Abstract

This work explores how to use self-supervised learning on videos to learn a class-specific image embedding that encodes pose and shape information. At train time, two frames of the same video of an object class (e.g. human upper body) are extracted and each encoded to an embedding. Conditioned on these embeddings, the decoder network is tasked to transform one frame into another. To successfully perform long range transformations (e.g. a wrist lowered in one image should be mapped to the same wrist raised in another), we introduce a hierarchical probabilistic network decoder model. Once trained, the embedding can be used for a variety of downstream tasks and domains. We demonstrate our approach quantitatively on three distinct deformable object classes -- human full bodies, upper bodies, faces -- and show experimentally that the learned embeddings do indeed generalise. They achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.