Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li

TL;DR
This paper introduces TACO, a test-time scaling method that enhances the stability and success of vision-language-action models during inference by preventing distribution shift-induced fragility, without requiring retraining.
Contribution
Proposes TACO, a lightweight, inference-only framework using pseudo-count verification to improve VLA model robustness against distribution shifts during task execution.
Findings
Significantly improves inference stability across multiple benchmarks.
Increases success rates in downstream task adaptations.
Reduces computational cost compared to reinforcement learning updates.
Abstract
Vision-Language-Action (VLA) models, trained via flow-matching or diffusion objectives, excel at learning complex behaviors from large-scale, multi-modal datasets (e.g., human teleoperation, scripted policies). However, since VLAs incorporate diverse data modes in the pre-training stage, and the finetuning dataset often contains demonstration data collected in a kinematically suboptimal or undesirable way, it exists redundant action modes that are irrelevant to the success action modes of the downstream task. Specifically, we observe a critical inference-time fragility among various sampled noises after supervised finetuning of pre-trained VLAs. In this paper, we attribute this instability to the distribution shift between the VLA policy and the policy induced by stable success modes of the downstream task dataset. Thus, we propose \textbf{TACO}, a test-time-scaling (TTS) framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning
