SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal, Krzysztof Choromanski, Deepali Jain, Avinava Dubey, Jake, Varley, Michael Ryoo, Yao Lu, Frederick Liu, Vikas Sindhwani, Quan Vuong,, Tamas Sarlos, Ken Oslund, Karol Hausman, Kanishka Rao

TL;DR
SARA-RT introduces a novel fine-tuning method called up-training that efficiently converts large, quadratic-time Robotics Transformers into linear-attention models, enabling faster on-robot deployment without sacrificing performance.
Contribution
The paper proposes up-training, a new fine-tuning approach that transforms pre-trained Robotics Transformers into efficient linear-attention models for practical deployment.
Findings
Speeds up RT-2 vision-language-action models
Accelerates Point Cloud Transformer policies
Maintains high quality after conversion
Abstract
We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (including massive billion-parameter vision-language-action models or VLAs), into their efficient linear-attention counterparts maintaining high quality. We demonstrate the effectiveness of SARA-RT by speeding up: (a) the class of recently introduced RT-2 models, the first VLA robotic policies pre-trained on internet-scale data, as well as (b) Point Cloud Transformer (PCT) robotic policies operating on large point clouds. We complement our results with the rigorous mathematical analysis providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Age of Information Optimization · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer
