PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu Jin, Pengfei Li, Yuhang, Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao

TL;DR
PlanAgent introduces a multi-modal large language model-based system for vehicle motion planning, enhancing interpretability, reasoning, and generalization in autonomous driving, especially in complex long-tailed scenarios, outperforming existing methods.
Contribution
The paper presents the first mid-to-mid planning system using a multi-modal large language model for closed-loop vehicle motion planning, integrating scene understanding, reasoning, and reflection modules.
Findings
Outperforms state-of-the-art in nuPlan benchmarks
Effectively handles complex long-tailed scenarios
Demonstrates improved generalization and interpretability
Abstract
Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
