Dynamic Execution Commitment of Vision-Language-Action Models
Feng Chen, Xianghui Wang, Yuxuan Chen, Boying Li, Yefei He, Zeyu Zhang, Yicheng Wu

TL;DR
This paper introduces A3, an adaptive mechanism for vision-language-action models that dynamically determines execution horizons based on self-verification, improving robustness and efficiency without manual tuning.
Contribution
A3 reframes execution commitment as a self-speculative verification problem, enabling dynamic, reliable, and efficient action execution in vision-language-action models.
Findings
A3 eliminates manual horizon tuning across diverse models.
A3 achieves better robustness and throughput trade-offs.
A3 maintains physical rollout integrity through prefix verification.
Abstract
Vision-Language-Action (VLA) models predominantly adopt action chunking, i.e., predicting and committing to a short horizon of consecutive low-level actions in a single forward pass, to amortize the inference cost of large-scale backbones and reduce per-step latency. However, committing these multi-step predictions to real-world execution requires balancing success rate against inference efficiency, a decision typically governed by fixed execution horizons tuned per task. Such heuristics ignore the state-dependent nature of predictive reliability, leading to brittle performance in dynamic or out-of-distribution settings. In this paper, we introduce A3, an Adaptive Action Acceptance mechanism that reframes dynamic execution commitment as a self-speculative prefix verification problem. A3 first computes a trajectory-wise consensus score of actions via group sampling, then selects a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
