BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
Xiaoyu Ma, Zhengqing Yuan, Zheyuan Zhang, Kaiwen Shi, Lichao Sun, Yanfang Ye

TL;DR
BLURR is a lightweight inference wrapper that accelerates vision-language-action models for real-time applications without retraining, maintaining performance while reducing computational costs.
Contribution
BLURR introduces a plug-in inference acceleration method for VLA models that preserves original interfaces and improves efficiency without retraining.
Findings
Maintains task success rates comparable to original controllers.
Reduces effective FLOPs and wall clock latency significantly.
Enables real-time manipulation with interactive web demos.
Abstract
Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robot Manipulation and Learning
