AttenA+: Rectifying Action Inequality in Robotic Foundation Models

Daojie Peng; Fulong Ma; Jiahang Cao; Qiang Zhang; Xupeng Xie; Jian Guo; Ping Luo; Andrew F. Luo; Boyu Zhou; Jun Ma

arXiv:2605.13548·cs.RO·May 14, 2026

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

Daojie Peng, Fulong Ma, Jiahang Cao, Qiang Zhang, Xupeng Xie, Jian Guo, Ping Luo, Andrew F. Luo, Boyu Zhou, Jun Ma

PDF

1 Models

TL;DR

AttenA+ enhances robotic foundation models by emphasizing critical action segments based on velocity, aligning training with physical task demands, and improving performance on benchmarks and real-world tasks.

Contribution

Introduces AttenA+, a velocity-driven attention framework that reweights training focus on critical action segments without structural changes, boosting model effectiveness.

Findings

01

Improves Libero benchmark accuracy to 98.6%.

02

Enhances RoboTwin 2.0 performance to 92.4%.

03

Demonstrates robustness on real-world robotic tasks.

Abstract

Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm, inherited from language modeling, remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous, where low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. Such a misalignment between uniform loss weighting and physical criticality fundamentally limits the performance of current Vision-Language-Action (VLA) models and World-Action Models (WAM) in complex, long-horizon tasks. To rectify this, we introduce AttenA+, an architecture-agnostic framework that prioritizes kinematically critical segments via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
wsagi/X-VLA-PickOrange
model· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.