Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Mohit Shridhar; Lucas Manuelli; Dieter Fox

arXiv:2209.05451·cs.RO·November 14, 2022·48 cites

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Mohit Shridhar, Lucas Manuelli, Dieter Fox

PDF

Open Access 1 Repo

TL;DR

Perceiver-Actor is a multi-task transformer model that leverages 3D voxel observations and language goals to efficiently learn diverse robotic manipulation tasks from limited data.

Contribution

This paper introduces PerAct, a novel transformer-based framework that encodes 3D voxel observations and language instructions for multi-task robotic manipulation, demonstrating superior performance with limited data.

Findings

01

Outperforms image-to-action agents and 3D ConvNets on various tasks

02

Learns 18 RLBench tasks and 7 real-world tasks with few demonstrations

03

Effectively encodes language goals and 3D observations for manipulation

Abstract

Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulation still benefit from Transformers with the right problem formulation? We investigate this question with PerAct, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation. PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by ``detecting the next best voxel action''. Unlike frameworks that operate on 2D images, the voxelized 3D observation and action space provides a strong structural prior for efficiently learning 6-DoF actions. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peract/peract
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Absolute Position Encodings · Dropout · Dense Connections