PADriver: Towards Personalized Autonomous Driving

Genghua Kou; Fan Jia; Weixin Mao; Yingfei Liu; Yucheng Zhao; Ziheng; Zhang; Osamu Yoshie; Tiancai Wang; Ying Li; Xiangyu Zhang

arXiv:2505.05240·cs.CV·May 9, 2025

PADriver: Towards Personalized Autonomous Driving

Genghua Kou, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Ziheng, Zhang, Osamu Yoshie, Tiancai Wang, Ying Li, Xiangyu Zhang

PDF

Open Access 4 Reviews

TL;DR

PADriver introduces a personalized autonomous driving framework leveraging multi-modal large language models, enabling scene understanding, risk assessment, and decision-making tailored to individual preferences, with comprehensive evaluation on a new highway benchmark.

Contribution

The paper presents PADriver, a novel personalized autonomous driving framework using MLLM, and introduces PAD-Highway, a new benchmark for evaluating decision performance in personalized driving.

Findings

01

PADriver outperforms state-of-the-art methods on evaluation metrics.

02

The framework enables various personalized driving modes.

03

Constructed PAD-Highway benchmark with 250 hours of annotated videos.

Abstract

In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD). Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs. It autoaggressively performs scene understanding, danger level estimation and action decision. The predicted danger level reflects the risk of the potential action and provides an explicit reference for the final action, which corresponds to the preset personalized prompt. Moreover, we construct a closed-loop benchmark named PAD-Highway based on Highway-Env simulator to comprehensively evaluate the decision performance under traffic rules. The dataset contains 250 hours videos with high-quality annotation to facilitate the development of PAD behavior analysis. Experimental results on the constructed benchmark show that PADriver outperforms…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 4

Strengths

- The motivation of this paper is regarding the discussion about the personalized auto drive system designed to provide a general auto-drive mode preference adopted to the end user and changeable from the personal prompt which is a great application direction. - The design of the drive mode is defined as slow, fast and normal as the general preference is well balanced, and the framework involves the scene description, danger level to do the action decision is technical sounds. - The proposed

Weaknesses

- The previous methods focus on individual modeling of personal auto-drive could provide a different value as data-driven personal design of the driving, which is not a drawback. Author should put more focus on the method comparison, as general preference personalization and individual personalization are two different directions of the applications. - Prompt setup is the key for the framework, and it lacks of the detail contents of it, e.g., traffic rule, as a general pre-set prompt, what the

Reviewer 02Rating 5Confidence 4

Strengths

1. The personalized autonomous driving topic is interesting and motivating. 2. The reported results demonstrate better performance than existing methods on the proposed benchmark.

Weaknesses

1. The visual environment in this simulation is very simple, which means the proposed method would not be practical in real-world applications. Additionally, it seems this paper mainly addresses motion planning. Based on these, the visual environment should be more complicated if the visual encoder is used, such as changing the shape and color of the simulated "cars". Otherwise, the visual encoder should not be used and only use langugae modality for motion planning. 2. The information between

Reviewer 03Rating 3Confidence 4

Strengths

1. PADriver introduces a novel closed-loop system for personalized autonomous driving, utilizing Multi-modal Large Language Models for comprehensive scene understanding and decision-making. 2. The introduction of a "danger level" metric to evaluate the risk of potential actions enhances the accuracy and safety of the driving decisions, setting PADriver apart from existing methods. 3. The PAD-Highway, a benchmark with 250 hours of high-quality annotated data, provides a robust platform for eval

Weaknesses

1. The current implementation of three driving styles (slow, fast, normal) is limited to speed variations. Expanding the range of driving styles could better reflect the "Personalized" aspect emphasized in the title. 2. The experiments are solely based on Vicuna-7B-1.5. It would be beneficial to test other LLM weights, such as LLaMA3, LLaVA, and Vicuna-13B-1.5, to evaluate performance across different models. 3. The driving personalities are currently represented through statistics. Providing

Reviewer 04Rating 3Confidence 4

Strengths

1. The paper is clearly motivated. This paper has a very clear motivation: enabling personalized autonomous driving with MLLM. This is an interesting direction and can potentially lead to interesting technical contributions or datasets. 2. The paper provides human-labeled logs in the proposed PAD-Highway dataset. In the PAD-Highway dataset, this work presents 25 hours of human-based driving logs and scores for driving experience in the HighwayEnv environment. I believe this collection of hum

Weaknesses

1. The necessity of making PADriver a MLLM for HighwayEnv is unclear. The most important contribution of this paper is the proposed PADriver, which is an MLLM trained for HighwayEnv. As mentioned in Line044, the main reason is that only providing text description is not sufficient to capture enough scene information. Therefore, they use vision inputs with BEV rasterizations. However, in fact, the BEV image from HighwayEnv does not provide any additional information than all the agent's locatio

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis