Bench2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving
Tianyuan Zhang, Ting Jin, Lu Wang, Jiangfan Liu, Siyuan Liang, Mingchuan Zhang, Aishan Liu, Xianglong Liu

TL;DR
Bench2ADVLM is a comprehensive closed-loop evaluation framework for vision-language models in autonomous driving, enabling real-time testing in simulation and physical environments to better assess safety and performance.
Contribution
It introduces a hierarchical closed-loop evaluation pipeline and a physical testing platform for ADVLMs, addressing the gap in current open-loop assessment methods.
Findings
Existing ADVLMs show limited performance in closed-loop settings.
The framework uncovers potential failure modes in real-world scenarios.
Physical testing validates simulation-based evaluations.
Abstract
Vision-Language Models (VLMs) have recently emerged as a promising paradigm in autonomous driving (AD). However, current performance evaluation protocols for VLM-based AD systems (ADVLMs) are predominantly confined to open-loop settings with static inputs, neglecting the more realistic and informative closed-loop setting that captures interactive behavior, feedback resilience, and real-world safety. To address this, we introduce Bench2ADVLM, a unified hierarchical closed-loop evaluation framework for real-time, interactive assessment of ADVLMs across both simulation and physical platforms. Inspired by dual-process theories of cognition, we first adapt diverse ADVLMs to simulation environments via a dual-system adaptation architecture. In this design, heterogeneous high-level driving commands generated by target ADVLMs (fast system) are interpreted by a general-purpose VLM (slow system)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety · Formal Methods in Verification
