DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Erfei Cui, Wenhai Wang, Zhiqi Li, Jiangwei Xie, Haoming Zou, Hanming Deng, Gen Luo, Lewei Lu, Xizhou Zhu, Jifeng Dai

TL;DR
DriveMLM introduces a multimodal large language model framework for autonomous driving that integrates decision-making, sensor inputs, and explanations, improving existing systems' performance in realistic simulations.
Contribution
This work presents the first multimodal LLM-based framework for autonomous driving that can be integrated into existing systems for improved decision-making and explainability.
Findings
Significant performance improvements in CARLA simulations.
Effective integration of LLMs into autonomous driving modules.
Enhanced decision explanation capabilities.
Abstract
Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multimodal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Autopilot and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator · Adaptive Parameter-wise Diagonal Quasi-Newton Method
