TL;DR
Drive-R1 is a novel vision-language model that integrates reasoning and planning for autonomous driving by combining supervised fine-tuning and reinforcement learning, leading to improved decision-making and trajectory prediction.
Contribution
It introduces Drive-R1, a VLM that bridges reasoning and planning in autonomous driving through step-by-step reasoning and reinforcement learning-based optimization.
Findings
Drive-R1 outperforms existing VLMs on nuScenes and DriveLM-nuScenes benchmarks.
Reinforcement learning enhances reasoning quality for better planning.
Step-by-step reasoning improves the interpretability and accuracy of planning decisions.
Abstract
Large vision-language models (VLMs) for autonomous driving (AD) are evolving beyond perception and cognition tasks toward motion planning. However, we identify two critical challenges in this direction: (1) VLMs tend to learn shortcuts by relying heavily on history input information, achieving seemingly strong planning results without genuinely understanding the visual inputs; and (2) the chain-ofthought (COT) reasoning processes are always misaligned with the motion planning outcomes, and how to effectively leverage the complex reasoning capability to enhance planning remains largely underexplored. In this paper, we start from a small-scale domain-specific VLM and propose Drive-R1 designed to bridges the scenario reasoning and motion planning for AD. Drive-R1 first undergoes the supervised finetuning on a elaborate dataset containing both long and short COT data. Drive-R1 is encouraged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
