LIV: Language-Image Representations and Rewards for Robotic Control

Yecheng Jason Ma; William Liang; Vaidehi Som; Vikash Kumar; Amy Zhang,; Osbert Bastani; Dinesh Jayaraman

arXiv:2306.00958·cs.RO·June 2, 2023·24 cites

LIV: Language-Image Representations and Rewards for Robotic Control

Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang,, Osbert Bastani, Dinesh Jayaraman

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

LIV introduces a unified vision-language representation and reward learning framework from videos and text, enabling robots to understand and achieve goals in unseen environments with improved control and reward specification.

Contribution

The paper presents LIV, the first control-centric vision-language representation trained on large human video datasets, combining dual reinforcement learning and contrastive learning for robotic control.

Findings

01

LIV outperforms prior state representations in imitation learning.

02

LIV improves reward specification for policy synthesis.

03

LIV effectively generalizes to unseen environments and tasks.

Abstract

We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks specified as language or image goals. We use LIV to pre-train the first control-centric vision-language representation from large human video datasets such as EpicKitchen. Given only a language or image goal, the pre-trained LIV model can assign dense rewards to each frame in videos of unseen robots or humans attempting that task in unseen environments. Further, when some target domain-specific data is available, the same objective can be used to fine-tune and improve LIV and even other pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

penn-pal-lab/liv
pytorchOfficial

Models

🤗
jasonyma/LIV
model· 122 dl· ♡ 2
122 dl♡ 2

Videos

LIV: Language-Image Representations and Rewards for Robotic Control· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · interferon and immune responses · Domain Adaptation and Few-Shot Learning