Dynamic Execution Commitment of Vision-Language-Action Models

Feng Chen; Xianghui Wang; Yuxuan Chen; Boying Li; Yefei He; Zeyu Zhang; Yicheng Wu

arXiv:2605.11567·cs.CV·May 19, 2026

Dynamic Execution Commitment of Vision-Language-Action Models

Feng Chen, Xianghui Wang, Yuxuan Chen, Boying Li, Yefei He, Zeyu Zhang, Yicheng Wu

PDF

TL;DR

This paper introduces A3, an adaptive mechanism for vision-language-action models that dynamically determines execution horizons based on self-verification, improving robustness and efficiency without manual tuning.

Contribution

A3 reframes execution commitment as a self-speculative verification problem, enabling dynamic, reliable, and efficient action execution in vision-language-action models.

Findings

01

A3 eliminates manual horizon tuning across diverse models.

02

A3 achieves better robustness and throughput trade-offs.

03

A3 maintains physical rollout integrity through prefix verification.

Abstract

Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale backbones and reduce per-step latency. However, committing these multi-step predictions to real-world execution requires balancing success rate against inference efficiency, a decision typically governed by fixed execution horizons tuned per task. Such heuristics ignore the state-dependent nature of predictive reliability, leading to brittle performance in dynamic or out-of-distribution settings. In this paper, we introduce A3, an Adaptive Action Acceptance mechanism that reframes dynamic execution commitment as a self-speculative prefix verification problem. A3 first computes a trajectory-wise consensus score of actions via group sampling, then selects a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.