Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Yuehao Song; Shaoyu Chen; Hao Gao; Yifan Zhu; Weixiang Yue; Jialv Zou; Bo Jiang; Zihao Lu; Yu Wang; Qian Zhang; and Xinggang Wang

arXiv:2603.11219·cs.CV·March 13, 2026

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Yuehao Song, Shaoyu Chen, Hao Gao, Yifan Zhu, Weixiang Yue, Jialv Zou, Bo Jiang, Zihao Lu, Yu Wang, Qian Zhang, and Xinggang Wang

PDF

Open Access

TL;DR

Senna-2 introduces a novel alignment approach between vision-language models and end-to-end driving policies, improving decision consistency and safety through a three-stage training process involving pre-training, open-loop, and closed-loop reinforcement learning.

Contribution

It presents a new three-stage training paradigm that explicitly aligns VLM and E2E driving systems for more consistent and safer autonomous driving decisions.

Findings

01

19.3% F1 score improvement in dual-system consistency

02

5.7% FDE reduction in open-loop driving

03

30.6% AF-CR reduction in closed-loop safety

Abstract

Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications