Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Siyuan Yang; Yang Zhang; Haoran He; Ling Pan; Xiu Li; Chenjia Bai; Xuelong Li

arXiv:2512.02834·cs.RO·December 3, 2025

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Siyuan Yang, Yang Zhang, Haoran He, Ling Pan, Xiu Li, Chenjia Bai, Xuelong Li

PDF

Open Access 1 Models

TL;DR

This paper introduces TACO, a test-time scaling method that enhances the stability and success of vision-language-action models during inference by preventing distribution shift-induced fragility, without requiring retraining.

Contribution

Proposes TACO, a lightweight, inference-only framework using pseudo-count verification to improve VLA model robustness against distribution shifts during task execution.

Findings

01

Significantly improves inference stability across multiple benchmarks.

02

Increases success rates in downstream task adaptations.

03

Reduces computational cost compared to reinforcement learning updates.

Abstract

Vision-Language-Action (VLA) models, trained via flow-matching or diffusion objectives, excel at learning complex behaviors from large-scale, multi-modal datasets (e.g., human teleoperation, scripted policies). However, since VLAs incorporate diverse data modes in the pre-training stage, and the finetuning dataset often contains demonstration data collected in a kinematically suboptimal or undesirable way, it exists redundant action modes that are irrelevant to the success action modes of the downstream task. Specifically, we observe a critical inference-time fragility among various sampled noises after supervised finetuning of pre-trained VLAs. In this paper, we attribute this instability to the distribution shift between the VLA policy and the policy induced by stable success modes of the downstream task dataset. Thus, we propose \textbf{TACO}, a test-time-scaling (TTS) framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
rhodes-team-teleai/pi05_TACO_libero_finetuned
model· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning