VLA Knows Its Limits

Haoxuan Wang; Gengyu Zhang; Yan Yan; Ramana Rao Kompella; Gaowen Liu

arXiv:2602.21445·cs.RO·February 26, 2026

VLA Knows Its Limits

Haoxuan Wang, Gengyu Zhang, Yan Yan, Ramana Rao Kompella, Gaowen Liu

PDF

Open Access

TL;DR

This paper investigates the impact of execution horizon in flow-based Vision-Language-Action models, revealing performance fluctuations and proposing a dynamic estimation method, AutoHorizon, to adapt to environmental changes.

Contribution

It introduces AutoHorizon, the first method to dynamically estimate execution horizon at test time, improving adaptability and performance in flow-based VLA models.

Findings

01

Performance varies with execution horizon, initially improving then declining.

02

AutoHorizon effectively adapts horizon, enhancing robotic manipulation tasks.

03

AutoHorizon generalizes across tasks and models with minimal overhead.

Abstract

Action chunking has recently emerged as a standard practice in flow-based Vision-Language-Action (VLA) models. However, the effect and choice of the execution horizon - the number of actions to be executed from each predicted chunk - remains underexplored. In this work, we first show that varying the execution horizon leads to substantial performance deviations, with performance initially improving and then declining as the horizon increases. To uncover the reasons, we analyze the cross- and self-attention weights in flow-based VLAs and reveal two key phenomena: (i) intra-chunk actions attend invariantly to vision-language tokens, limiting adaptability to environmental changes; and (ii) the initial and terminal action tokens serve as stable anchors, forming latent centers around which intermediate actions are organized. Motivated by these insights, we interpret action self-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI