VLS: Steering Pretrained Robot Policies via Vision-Language Models

Shuo Liu; Ishneet Sukhvinder Singh; Yiqing Xu; Jiafei Duan; Ranjay Krishna

arXiv:2602.03973·cs.RO·February 5, 2026

VLS: Steering Pretrained Robot Policies via Vision-Language Models

Shuo Liu, Ishneet Sukhvinder Singh, Yiqing Xu, Jiafei Duan, Ranjay Krishna

PDF

Open Access

TL;DR

VLS is a training-free method that adapts pretrained robot policies at inference time using vision-language models to handle spatial and task variations without retraining.

Contribution

It introduces Vision-Language Steering (VLS), a novel inference-time control framework that guides pretrained policies using synthesized reward functions from vision-language models.

Findings

01

VLS achieves 31% improvement on CALVIN benchmark.

02

VLS attains 13% gain on LIBERO-PRO.

03

Demonstrated robust real-world adaptation on a Franka robot.

Abstract

Why do pretrained diffusion or flow-matching policies fail when the same task is performed near an obstacle, on a shifted support surface, or amid mild clutter? Such failures rarely reflect missing motor skills; instead, they expose a limitation of imitation learning under train-test shifts, where action generation is tightly coupled to training-specific spatial configurations and task specifications. Retraining or fine-tuning to address these failures is costly and conceptually misaligned, as the required behaviors already exist but cannot be selectively adapted at test time. We propose Vision-Language Steering (VLS), a training-free framework for inference-time adaptation of frozen generative robot policies. VLS treats adaptation as an inference-time control problem, steering the sampling process of a pretrained diffusion or flow-matching policy in response to out-of-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications