Imitating Task and Motion Planning with Visuomotor Transformers
Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan, Salakhutdinov, Dieter Fox

TL;DR
This paper introduces OPTIMUS, a novel imitation learning system that leverages large-scale datasets generated by Task and Motion Planning (TAMP) and employs visuomotor Transformers to enable robots to perform diverse manipulation tasks with high success rates.
Contribution
The paper presents a new pipeline for generating TAMP datasets tailored for imitation learning and demonstrates the effectiveness of Transformer-based policies trained on this data.
Findings
OPTIMUS achieves 70-80% success rates on various manipulation tasks.
Large-scale TAMP-generated datasets improve imitation learning performance.
Transformer policies can effectively learn complex manipulation skills.
Abstract
Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Dense Connections
