PACT: Perception-Action Causal Transformer for Autoregressive Robotics   Pre-Training

Rogerio Bonatti; Sai Vemprala; Shuang Ma; Felipe Frujeri; Shuhang; Chen; Ashish Kapoor

arXiv:2209.11133·cs.RO·September 27, 2022

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

Rogerio Bonatti, Sai Vemprala, Shuang Ma, Felipe Frujeri, Shuhang, Chen, Ashish Kapoor

PDF

Open Access

TL;DR

This paper introduces PACT, a transformer-based pre-training method for robots that learns general representations from data, enabling efficient multi-task adaptation for navigation, localization, and mapping.

Contribution

The work presents a novel self-supervised pre-training approach for robots using a causal transformer, facilitating multi-task learning with improved efficiency and performance.

Findings

01

Pretrained PACT models outperform training from scratch on multiple tasks.

02

Finetuning small task-specific networks yields significant performance gains.

03

Shared representations reduce model capacity and improve deployment speed.

Abstract

Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge. Inspired by large pre-trained language models, this work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. Through autoregressive prediction of states and actions over time, our model implicitly encodes dynamics and behaviors for a particular robot. Our experimental evaluation focuses on the domain of mobile agents, where we show that this robot-specific representation can function as a single starting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications · Modular Robots and Swarm Intelligence

MethodsAttention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Label Smoothing