Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, Wenwei Zhang,, Qi Alfred Chen, Ziwei Liu, Liang Pan

TL;DR
This paper evaluates the reliability of Vision-Language Models (VLMs) for autonomous driving, revealing their limitations in visual grounding and robustness, and proposes improved evaluation metrics and future directions for safer deployment.
Contribution
Introduction of DriveBench, a comprehensive benchmark dataset and evaluation framework for assessing VLM reliability in autonomous driving scenarios, highlighting current limitations and proposing solutions.
Findings
VLMs often rely on textual cues rather than true visual grounding.
VLMs are sensitive to input corruptions, affecting performance.
Current evaluation metrics may conceal reliability issues.
Abstract
Recent advancements in Vision-Language Models (VLMs) have sparked interest in their use for autonomous driving, particularly in generating interpretable driving decisions through natural language. However, the assumption that VLMs inherently provide visually grounded, reliable, and interpretable explanations for driving remains largely unexamined. To address this gap, we introduce DriveBench, a benchmark dataset designed to evaluate VLM reliability across 17 settings (clean, corrupted, and text-only inputs), encompassing 19,200 frames, 20,498 question-answer pairs, three question types, four mainstream driving tasks, and a total of 12 popular VLMs. Our findings reveal that VLMs often generate plausible responses derived from general knowledge or textual cues rather than true visual grounding, especially under degraded or missing visual inputs. This behavior, concealed by dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation Planning and Optimization · Human-Automation Interaction and Safety · Vehicle emissions and performance
