TL;DR
AutoDrive-P3 introduces a unified perception-prediction-planning framework with structured reasoning and reinforcement learning, enhancing autonomous driving safety, interpretability, and performance.
Contribution
It presents a novel integrated framework with hierarchical reinforcement learning and a new dataset for coherent reasoning in autonomous driving.
Findings
Achieves state-of-the-art planning performance on nuScenes and NAVSIM benchmarks.
Demonstrates improved safety and interpretability through structured reasoning.
Balances inference efficiency with detailed and fast thinking modes.
Abstract
Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a significant domain gap and compromises decision-making capability; 2) Other VLMs can generate outputs for perception, prediction, and planning tasks but employ a fragmented decision-making approach where these modules operate separately, leading to a significant lack of synergy that undermines true planning performance. To address these limitations, we propose , a novel framework that seamlessly integrates erception, rediction, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
