Driving with LLMs: Fusing Object-Level Vector Modality for Explainable   Autonomous Driving

Long Chen; Oleg Sinavski; Jan H\"unermann; Alice Karnsund; Andrew; James Willmott; Danny Birch; Daniel Maund; Jamie Shotton

arXiv:2310.01957·cs.RO·October 17, 2023·6 cites

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen, Oleg Sinavski, Jan H\"unermann, Alice Karnsund, Andrew, James Willmott, Danny Birch, Daniel Maund, Jamie Shotton

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel object-level multimodal LLM architecture for autonomous driving that fuses vectorized numeric data with language models, improving interpretability and decision-making in driving scenarios.

Contribution

It introduces a new multimodal LLM architecture, a large-scale driving QA dataset, and a pretraining strategy to align numeric modalities with language models, advancing explainable autonomous driving.

Findings

01

LLM-driver effectively interprets driving scenarios and answers questions.

02

The approach outperforms traditional behavioral cloning in driving action generation.

03

The dataset and benchmark facilitate further research in explainable autonomous driving.

Abstract

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wayveai/driving-with-llms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsALIGN