Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs
Zeyu Dong, Yimin Zhu, Yansong Li, Kevin Mahon, Yu Sun

TL;DR
This paper introduces a novel architecture integrating multimodal large language models into end-to-end autonomous driving systems, improving real-world generalization without extensive fine-tuning or large datasets.
Contribution
The proposed architecture allows LLMs to generate high-level driving instructions periodically, reducing data needs and latency issues, and enhancing generalization in real-world environments.
Findings
Improved generalization in complex obstacle scenarios.
Reduced data collection requirements for training.
Enhanced robustness without fine-tuning LLMs.
Abstract
Traditional autonomous driving methods adopt a modular design, decomposing tasks into sub-tasks. In contrast, end-to-end autonomous driving directly outputs actions from raw sensor data, avoiding error accumulation. However, training an end-to-end model requires a comprehensive dataset; otherwise, the model exhibits poor generalization capabilities. Recently, large language models (LLMs) have been applied to enhance the generalization capabilities of end-to-end driving models. Most studies explore LLMs in an open-loop manner, where the output actions are compared to those of experts without direct feedback from the real world, while others examine closed-loop results only in simulations. This paper proposes an efficient architecture that integrates multimodal LLMs into end-to-end driving models operating in closed-loop settings in real-world environments. In our architecture, the LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications
