Time-Contrastive Networks: Self-Supervised Learning from Video
Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang,, Stefan Schaal, Sergey Levine

TL;DR
This paper introduces a self-supervised learning method called Time-Contrastive Networks that learns viewpoint-invariant representations from unlabeled videos, enabling robots to imitate human behaviors and poses without explicit supervision.
Contribution
The paper presents a novel self-supervised approach using metric learning to learn robust, viewpoint-invariant representations from unlabeled videos for robotic imitation tasks.
Findings
Robots can imitate human poses directly from learned representations.
The learned representations serve as effective reward functions for reinforcement learning.
The approach works with minimal demonstration data, enabling practical robotic learning.
Abstract
We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Robot Manipulation and Learning
Methods1cycle learning rate scheduling policy
