Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete

Joonha Park; Jiseung Jeong; Taesik Gong

arXiv:2605.12160·cs.RO·May 13, 2026

Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete

Joonha Park, Jiseung Jeong, Taesik Gong

PDF

TL;DR

Premover enables vision-language-action policies to start acting earlier by precomputing during user input delays, significantly reducing response time without sacrificing success rate.

Contribution

Introduces Premover, a lightweight module that leverages idle time for precomputation, improving efficiency of VLA policies during user input delays.

Findings

01

Premover reduces mean wall-clock time by 13.6% on LIBERO benchmark.

02

Premover maintains high success rate comparable to full-prompt baseline.

03

Naive premoving drastically decreases performance, showing the effectiveness of Premover's approach.

Abstract

Vision-Language-Action (VLA) policies are typically evaluated as if the user had finished typing or speaking before the robot begins acting. In real deployment, however, users take several seconds to enter a request, leaving the policy idle for a substantial fraction of the interaction. We introduce Premover, a lightweight module that converts this idle window into useful precomputation. Premover keeps the VLA backbone frozen and attaches two small projection heads, one for image patches, one for language tokens, that map an intermediate layer of the backbone into a shared space. The resulting focus map is supervised by simulator-rendered target-object segmentation masks and applied as a per-patch reweighting of the next step's image tokens. A single scalar readiness threshold, trained jointly from streaming prefixes, decides when the policy should begin acting. On the LIBERO benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.