PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large   Language Models in Autonomous Driving

Xuewen Luo; Fan Ding; Yinsheng Song; Xiaofeng Zhang; and Junnyong Loo

arXiv:2412.02025·cs.RO·December 4, 2024

PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving

Xuewen Luo, Fan Ding, Yinsheng Song, Xiaofeng Zhang, and Junnyong Loo

PDF

Open Access

TL;DR

This paper introduces PKRD-CoT, a unified zero-shot prompt framework for multi-modal large language models in autonomous driving, enhancing decision-making by mimicking human reasoning without prior task-specific training.

Contribution

It proposes a novel prompt design that integrates perception, knowledge, reasoning, and decision-making, enabling MLLMs to perform autonomous driving tasks effectively in unstructured environments.

Findings

01

GPT-4.0 with PKRD-CoT outperforms baseline models in autonomous driving tasks.

02

PKRD-CoT demonstrates effectiveness across multiple MLLMs like Claude and LLava1.6.

03

The framework is versatile for various multi-modal large language models.

Abstract

There is growing interest in leveraging the capabilities of robust Multi-Modal Large Language Models (MLLMs) directly within autonomous driving contexts. However, the high costs and complexity of designing and training end-to-end autonomous driving models make them challenging for many enterprises and research entities. To address this, our study explores a seamless integration of MLLMs into autonomous driving systems by proposing a Zero-Shot Chain-of-Thought (Zero-Shot-CoT) prompt design named PKRD-CoT. PKRD-CoT is based on the four fundamental capabilities of autonomous driving: perception, knowledge, reasoning, and decision-making. This makes it particularly suitable for understanding and responding to dynamic driving environments by mimicking human thought processes step by step, thus enhancing decision-making in real-time scenarios. Our design enables MLLMs to tackle problems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques

MethodsAttention Is All You Need · Adam · Position-Wise Feed-Forward Layer · Linear Layer · Softmax · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Dropout · Dense Connections