VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani,, Vikash Kumar, Amy Zhang

TL;DR
VIP introduces a self-supervised pre-training method using human videos to generate dense, smooth reward functions for unseen robotic tasks, enabling effective reward-based control without task-specific data.
Contribution
The paper presents VIP, a novel implicit time contrastive objective for visual representation pre-training that produces reward functions from human videos without action labels.
Findings
VIP outperforms prior representations on robotic control tasks.
VIP enables few-shot offline RL with minimal trajectories.
VIP works effectively on both simulated and real robots.
Abstract
Reward and representation learning are two long-standing challenges for learning an expanding set of robot manipulation skills from sensory observations. Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how these human videos can be used for general-purpose reward learning remains an open question. We introduce alue-mplicit re-training (VIP), a self-supervised pre-trained visual representation capable of generating dense and smooth reward functions for unseen robotic tasks. VIP casts representation learning from human videos as an offline goal-conditioned reinforcement learning problem and derives a self-supervised dual goal-conditioned value-function objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeuroinflammation and Neurodegeneration Mechanisms · Neural dynamics and brain function · Reinforcement Learning in Robotics
