ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
Xueyi Liu, Zuodong Zhong, Yuxin Guo, Yun-Fu Liu, Zhiguo Su, Qichao Zhang, Junli Wang, Yinfeng Gao, Yupeng Zheng, Qiao Lin, Huiyong Chen, Dongbin Zhao

TL;DR
ReasonPlan introduces a novel multimodal large language model framework for closed-loop autonomous driving, combining holistic reasoning, self-supervised scene prediction, and decision reasoning to improve interpretability and generalization over traditional methods.
Contribution
It presents a new fine-tuning framework for MLLMs in autonomous driving, with a dual mechanism for reasoning and a curated decision reasoning dataset, outperforming existing imitation learning approaches.
Findings
Outperforms mainstream imitation learning by 19% L2 and 16.1 score on Bench2Drive
Demonstrates strong zero-shot generalization on unseen benchmarks
Curates a large, diverse decision reasoning dataset for autonomous driving
Abstract
Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques
MethodsSoftmax · Attention Is All You Need · ALIGN
