SteerVLA: Steering Vision-Language-Action Models in Long-Tail Driving Scenarios
Tian Gao, Celine Tan, Catherine Glossop, Timothy Gao, Jiankai Sun, Kyle Stachowicz, Shirley Wu, Oier Mees, Dorsa Sadigh, Sergey Levine, Chelsea Finn

TL;DR
SteerVLA integrates vision-language reasoning with low-level driving control to improve autonomous vehicle performance, especially in rare, long-tail scenarios, by leveraging detailed language annotations and a rich language interface.
Contribution
This paper introduces SteerVLA, a novel framework that combines vision-language models with a steerable driving policy for enhanced robustness in long-tail driving scenarios.
Findings
Outperforms state-of-the-art methods by 4.77 points in overall score.
Achieves 8.04 points improvement on long-tail driving scenarios.
Utilizes language annotations to enhance reasoning and steerability.
Abstract
A fundamental challenge in autonomous driving is the integration of high-level, semantic reasoning for long-tail events with low-level, reactive control for robust driving. While large vision-language models (VLMs) trained on web-scale data offer powerful common-sense reasoning, they lack the grounded experience necessary for safe vehicle control. We posit that an effective autonomous agent should leverage the world knowledge of VLMs to guide a steerable driving policy toward robust control in driving scenarios. To this end, we propose SteerVLA, which leverages the reasoning capabilities of VLMs to produce fine-grained language instructions that steer a vision-language-action (VLA) driving policy. Key to our method is this rich language interface between the high-level VLM and low-level VLA, which allows the high-level policy to more effectively ground its reasoning in the control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
