TL;DR
This paper introduces FASTER, a method that significantly reduces reaction latency in real-time vision-language-action models by adaptively prioritizing immediate actions, enabling more responsive robot behaviors.
Contribution
FASTER proposes a horizon-aware scheduling approach that compresses immediate reaction sampling, improving real-time responsiveness without sacrificing long-term trajectory quality.
Findings
FASTER reduces reaction latency tenfold in real-world tasks.
It enables rapid, accurate, and smooth trajectories in dynamic environments.
Experimental results show improved responsiveness on consumer-grade GPUs.
Abstract
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
