Imitating Task and Motion Planning with Visuomotor Transformers

Murtaza Dalal; Ajay Mandlekar; Caelan Garrett; Ankur Handa; Ruslan; Salakhutdinov; Dieter Fox

arXiv:2305.16309·cs.RO·October 18, 2023·6 cites

Imitating Task and Motion Planning with Visuomotor Transformers

Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan, Salakhutdinov, Dieter Fox

PDF

Open Access

TL;DR

This paper introduces OPTIMUS, a novel imitation learning system that leverages large-scale datasets generated by Task and Motion Planning (TAMP) and employs visuomotor Transformers to enable robots to perform diverse manipulation tasks with high success rates.

Contribution

The paper presents a new pipeline for generating TAMP datasets tailored for imitation learning and demonstrates the effectiveness of Transformer-based policies trained on this data.

Findings

01

OPTIMUS achieves 70-80% success rates on various manipulation tasks.

02

Large-scale TAMP-generated datasets improve imitation learning performance.

03

Transformer policies can effectively learn complex manipulation skills.

Abstract

Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Dense Connections