RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph, Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog,, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally, Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov

TL;DR
This paper introduces the Robotics Transformer, a high-capacity model trained on large-scale, diverse robotic data, demonstrating improved generalization and scalability for real-world robotic control tasks.
Contribution
It presents a novel Transformer-based model architecture for robotics, emphasizing open-ended, task-agnostic training and scalability in data and model size.
Findings
The Robotics Transformer generalizes well across tasks with diverse data.
Model performance improves with increased data and model size.
Open-ended training enhances robotic control capabilities.
Abstract
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Softmax · Layer Normalization · Dropout · Byte Pair Encoding · Linear Layer
