Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

William Chen; Jagdeep Singh Bhatia; Catherine Glossop; Nikhil Mathihalli; Ria Doshi; Andy Tang; Danny Driess; Karl Pertsch; Sergey Levine

arXiv:2602.13193·cs.RO·April 7, 2026

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

William Chen, Jagdeep Singh Bhatia, Catherine Glossop, Nikhil Mathihalli, Ria Doshi, Andy Tang, Danny Driess, Karl Pertsch, Sergey Levine

PDF

2 Repos

TL;DR

This paper introduces Steerable Policies, a hierarchical control method that leverages pretrained vision-language models and synthetic commands to improve robot task generalization and controllability.

Contribution

It proposes a novel approach to ground VLM knowledge in low-level policies through rich synthetic commands, enhancing task generalization and control.

Findings

01

Outperforms prior VLAs and hierarchical baselines in real-world manipulation tasks.

02

Enables control via a learned high-level reasoner and off-the-shelf VLM prompting.

03

Demonstrates improved generalization and long-horizon task performance.

Abstract

Pretrained vision-language models (VLMs) can make semantic and visual inferences across diverse settings, providing valuable common-sense priors for robotic control. However, effectively grounding this knowledge in robot behaviors remains an open challenge. Prior methods often employ a hierarchical approach where VLMs reason over high-level commands to be executed by separate low-level policies, e.g., vision-language-action models (VLAs). The interface between VLMs and VLAs is usually natural language task instructions, which fundamentally limits how much VLM reasoning can steer low-level behavior. We thus introduce Steerable Policies: VLAs trained on rich synthetic commands at various levels of abstraction, like subtasks, motions, and grounded pixel coordinates. By improving low-level controllability, Steerable Policies can unlock pretrained knowledge in VLMs, enabling improved task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.