The Ingredients for Robotic Diffusion Transformers

Sudeep Dasari; Oier Mees; Sebastian Zhao; Mohan Kumar Srirama; Sergey; Levine

arXiv:2410.10088·cs.RO·October 15, 2024

The Ingredients for Robotic Diffusion Transformers

Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey, Levine

PDF

Open Access 3 Datasets

TL;DR

This paper introduces a novel diffusion transformer architecture for robotics that improves task performance and scalability, reducing the need for extensive hyper-parameter tuning across diverse robotic tasks and embodiments.

Contribution

The paper identifies key design choices for diffusion transformer policies, proposes an improved architecture called extit{method}, and demonstrates superior performance on long-horizon robotic tasks.

Findings

01

Outperforms state-of-the-art in long-horizon dexterous tasks

02

Shows improved scaling with multi-modal, language-annotated data

03

Reduces hyper-parameter tuning for diverse robotic setups

Abstract

In recent years roboticists have achieved remarkable progress in solving increasingly general tasks on dexterous robotic hardware by leveraging high capacity Transformer network architectures and generative diffusion models. Unfortunately, combining these two orthogonal improvements has proven surprisingly difficult, since there is no clear and well-understood process for making important design choices. In this paper, we identify, study and improve key architectural design decisions for high-capacity diffusion transformer policies. The resulting models can efficiently solve diverse tasks on multiple robot embodiments, without the excruciating pain of per-setup hyper-parameter tuning. By combining the results of our investigation with our improved model components, we are able to present a novel architecture, named \method, that significantly outperforms the state of the art in solving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer