SteerVLA: Steering Vision-Language-Action Models in Long-Tail Driving Scenarios

Tian Gao; Celine Tan; Catherine Glossop; Timothy Gao; Jiankai Sun; Kyle Stachowicz; Shirley Wu; Oier Mees; Dorsa Sadigh; Sergey Levine; Chelsea Finn

arXiv:2602.08440·cs.RO·February 16, 2026

SteerVLA: Steering Vision-Language-Action Models in Long-Tail Driving Scenarios

Tian Gao, Celine Tan, Catherine Glossop, Timothy Gao, Jiankai Sun, Kyle Stachowicz, Shirley Wu, Oier Mees, Dorsa Sadigh, Sergey Levine, Chelsea Finn

PDF

Open Access

TL;DR

SteerVLA integrates vision-language reasoning with low-level driving control to improve autonomous vehicle performance, especially in rare, long-tail scenarios, by leveraging detailed language annotations and a rich language interface.

Contribution

This paper introduces SteerVLA, a novel framework that combines vision-language models with a steerable driving policy for enhanced robustness in long-tail driving scenarios.

Findings

01

Outperforms state-of-the-art methods by 4.77 points in overall score.

02

Achieves 8.04 points improvement on long-tail driving scenarios.

03

Utilizes language annotations to enhance reasoning and steerability.

Abstract

A fundamental challenge in autonomous driving is the integration of high-level, semantic reasoning for long-tail events with low-level, reactive control for robust driving. While large vision-language models (VLMs) trained on web-scale data offer powerful common-sense reasoning, they lack the grounded experience necessary for safe vehicle control. We posit that an effective autonomous agent should leverage the world knowledge of VLMs to guide a steerable driving policy toward robust control in driving scenarios. To this end, we propose SteerVLA, which leverages the reasoning capabilities of VLMs to produce fine-grained language instructions that steer a vision-language-action (VLA) driving policy. Key to our method is this rich language interface between the high-level VLM and low-level VLA, which allows the high-level policy to more effectively ground its reasoning in the control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications