Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning
Yuehao Song, Shaoyu Chen, Hao Gao, Yifan Zhu, Weixiang Yue, Jialv Zou, Bo Jiang, Zihao Lu, Yu Wang, Qian Zhang, and Xinggang Wang

TL;DR
Senna-2 introduces a novel alignment approach between vision-language models and end-to-end driving policies, improving decision consistency and safety through a three-stage training process involving pre-training, open-loop, and closed-loop reinforcement learning.
Contribution
It presents a new three-stage training paradigm that explicitly aligns VLM and E2E driving systems for more consistent and safer autonomous driving decisions.
Findings
19.3% F1 score improvement in dual-system consistency
5.7% FDE reduction in open-loop driving
30.6% AF-CR reduction in closed-loop safety
Abstract
Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
