DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong, Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

TL;DR
DriveVLM introduces a vision-language model-based system for autonomous driving that enhances scene understanding and planning, combining reasoning modules with traditional pipelines for improved performance in complex urban scenarios.
Contribution
The paper presents DriveVLM, a novel autonomous driving system integrating vision-language models with reasoning modules, and DriveVLM-Dual, a hybrid system addressing VLM limitations for real-world deployment.
Findings
Effective in complex urban scenarios
Improves scene understanding and planning
Validated on real-world vehicle deployment
Abstract
A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
