Language-Driven Representation Learning for Robotics

Siddharth Karamcheti; Suraj Nair; Annie S. Chen; Thomas Kollar,; Chelsea Finn; Dorsa Sadigh; Percy Liang

arXiv:2302.12766·cs.RO·February 27, 2023·5 cites

Language-Driven Representation Learning for Robotics

Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar,, Chelsea Finn, Dorsa Sadigh, Percy Liang

PDF

Open Access 2 Repos

TL;DR

This paper introduces Voltron, a novel language-driven representation learning framework that leverages human videos and captions to improve visual representations for diverse robotic tasks, outperforming existing methods.

Contribution

The paper presents Voltron, a new framework combining visual reconstruction and language grounding, and provides a comprehensive evaluation suite for robotic visual representations.

Findings

01

Voltron outperforms prior state-of-the-art methods across five robotic tasks.

02

Language-driven features improve high-level semantic understanding in robotic perception.

03

Existing methods show inconsistent results across different robotic vision tasks.

Abstract

Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. Leveraging methods such as masked autoencoding and contrastive learning, these representations exhibit strong transfer to policy learning for visuomotor control. But, robot learning encompasses a diverse set of problems beyond control including grasp affordance prediction, language-conditioned imitation learning, and intent scoring for human-robot collaboration, amongst others. First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite. We then introduce Voltron, a framework for language-driven representation learning from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsContrastive Learning