DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Erfei Cui; Wenhai Wang; Zhiqi Li; Jiangwei Xie; Haoming Zou; Hanming Deng; Gen Luo; Lewei Lu; Xizhou Zhu; Jifeng Dai

arXiv:2312.09245·cs.CV·December 18, 2025·22 cites

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Erfei Cui, Wenhai Wang, Zhiqi Li, Jiangwei Xie, Haoming Zou, Hanming Deng, Gen Luo, Lewei Lu, Xizhou Zhu, Jifeng Dai

PDF

Open Access 2 Repos 1 Datasets

TL;DR

DriveMLM introduces a multimodal large language model framework for autonomous driving that integrates decision-making, sensor inputs, and explanations, improving existing systems' performance in realistic simulations.

Contribution

This work presents the first multimodal LLM-based framework for autonomous driving that can be integrated into existing systems for improved decision-making and explainability.

Findings

01

Significant performance improvements in CARLA simulations.

02

Effective integration of LLMs into autonomous driving modules.

03

Enhanced decision explanation capabilities.

Abstract

Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multimodal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Autopilot and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

taesiri/arxiv_qa
dataset· 193 dl
193 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator · Adaptive Parameter-wise Diagonal Quasi-Newton Method