Time-Contrastive Networks: Self-Supervised Learning from Video

Pierre Sermanet; Corey Lynch; Yevgen Chebotar; Jasmine Hsu; Eric Jang,; Stefan Schaal; Sergey Levine

arXiv:1704.06888·cs.CV·March 21, 2018·53 cites

Time-Contrastive Networks: Self-Supervised Learning from Video

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang,, Stefan Schaal, Sergey Levine

PDF

Open Access 5 Repos

TL;DR

This paper introduces a self-supervised learning method called Time-Contrastive Networks that learns viewpoint-invariant representations from unlabeled videos, enabling robots to imitate human behaviors and poses without explicit supervision.

Contribution

The paper presents a novel self-supervised approach using metric learning to learn robust, viewpoint-invariant representations from unlabeled videos for robotic imitation tasks.

Findings

01

Robots can imitate human poses directly from learned representations.

02

The learned representations serve as effective reward functions for reinforcement learning.

03

The approach works with minimal demonstration data, enabling practical robotic learning.

Abstract

We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Robot Manipulation and Learning

Methods1cycle learning rate scheduling policy