SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust   Attention

Isabel Leal; Krzysztof Choromanski; Deepali Jain; Avinava Dubey; Jake; Varley; Michael Ryoo; Yao Lu; Frederick Liu; Vikas Sindhwani; Quan Vuong,; Tamas Sarlos; Ken Oslund; Karol Hausman; Kanishka Rao

arXiv:2312.01990·cs.RO·December 5, 2023·1 cites

SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention

Isabel Leal, Krzysztof Choromanski, Deepali Jain, Avinava Dubey, Jake, Varley, Michael Ryoo, Yao Lu, Frederick Liu, Vikas Sindhwani, Quan Vuong,, Tamas Sarlos, Ken Oslund, Karol Hausman, Kanishka Rao

PDF

Open Access

TL;DR

SARA-RT introduces a novel fine-tuning method called up-training that efficiently converts large, quadratic-time Robotics Transformers into linear-attention models, enabling faster on-robot deployment without sacrificing performance.

Contribution

The paper proposes up-training, a new fine-tuning approach that transforms pre-trained Robotics Transformers into efficient linear-attention models for practical deployment.

Findings

01

Speeds up RT-2 vision-language-action models

02

Accelerates Point Cloud Transformer policies

03

Maintains high quality after conversion

Abstract

We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (including massive billion-parameter vision-language-action models or VLAs), into their efficient linear-attention counterparts maintaining high quality. We demonstrate the effectiveness of SARA-RT by speeding up: (a) the class of recently introduced RT-2 models, the first VLA robotic policies pre-trained on internet-scale data, as well as (b) Point Cloud Transformer (PCT) robotic policies operating on large point clouds. We complement our results with the rigorous mathematical analysis providing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Age of Information Optimization · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer