Making Large Language Models Better Planners with Reasoning-Decision   Alignment

Zhijian Huang; Tao Tang; Shaoxiang Chen; Sihao Lin; Zequn Jie; Lin Ma,; Guangrun Wang; Xiaodan Liang

arXiv:2408.13890·cs.CV·August 27, 2024

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma,, Guangrun Wang, Xiaodan Liang

PDF

Open Access

TL;DR

This paper introduces RDA-Driver, a multimodal LLM-based autonomous driving model that aligns reasoning and decision-making, improving interpretability and performance in complex traffic scenarios.

Contribution

It proposes a novel end-to-end decision-making framework with reasoning-decision alignment and redesigned CoTs for better scene understanding and planning.

Findings

01

Achieves state-of-the-art planning performance on nuScenes with 0.80 L2 error.

02

Demonstrates leading results on DriveLM-nuScenes with 0.82 L2 error.

03

Enhances explainability and decision accuracy in autonomous driving.

Abstract

Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm of LLMs on downstream data with the Chain-of-Thought (CoT) reasoning process can enhance explainability and scene understanding. However, such a popular strategy proves to suffer from the notorious problems of misalignment between the crafted CoTs against the consequent decision-making, which remains untouched by previous LLM-based AD methods. To address this problem, we motivate an end-to-end decision-making model based on multimodality-augmented LLM, which simultaneously executes CoT reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies