What Matters in Language Conditioned Robotic Imitation Learning over   Unstructured Data

Oier Mees; Lukas Hermann; Wolfram Burgard

arXiv:2204.06252·cs.RO·August 31, 2022

What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data

Oier Mees, Lukas Hermann, Wolfram Burgard

PDF

Open Access 2 Repos

TL;DR

This paper investigates critical challenges in language-conditioned robotic imitation learning from offline data, proposing architectural improvements and a novel model that significantly outperforms existing methods on complex manipulation benchmarks.

Contribution

It introduces a comprehensive analysis of design choices and presents a new model with hierarchical control, multimodal transformers, and contrastive learning, advancing the state of the art.

Findings

01

Significant performance improvements on CALVIN benchmark

02

Effective use of hierarchical decomposition and multimodal transformers

03

Open-sourced code and models for future research

Abstract

A long-standing goal in robotics is to build robots that can perform a wide range of daily tasks from perceptions obtained with their onboard sensors and specified only via natural language. While recently substantial advances have been achieved in language-driven robotics by leveraging end-to-end learning from pixels, there is no clear and well-understood process for making various design choices due to the underlying variation in setups. In this paper, we conduct an extensive study of the most critical challenges in learning language conditioned policies from offline free-form imitation datasets. We further identify architectural and algorithmic techniques that improve performance, such as a hierarchical decomposition of the robot control learning, a multimodal transformer encoder, discrete latent plans and a self-supervised contrastive loss that aligns video and language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition