Bench2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving

Tianyuan Zhang; Ting Jin; Lu Wang; Jiangfan Liu; Siyuan Liang; Mingchuan Zhang; Aishan Liu; Xianglong Liu

arXiv:2508.02028·cs.CV·August 21, 2025

Bench2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving

Tianyuan Zhang, Ting Jin, Lu Wang, Jiangfan Liu, Siyuan Liang, Mingchuan Zhang, Aishan Liu, Xianglong Liu

PDF

Open Access

TL;DR

Bench2ADVLM is a comprehensive closed-loop evaluation framework for vision-language models in autonomous driving, enabling real-time testing in simulation and physical environments to better assess safety and performance.

Contribution

It introduces a hierarchical closed-loop evaluation pipeline and a physical testing platform for ADVLMs, addressing the gap in current open-loop assessment methods.

Findings

01

Existing ADVLMs show limited performance in closed-loop settings.

02

The framework uncovers potential failure modes in real-world scenarios.

03

Physical testing validates simulation-based evaluations.

Abstract

Vision-Language Models (VLMs) have recently emerged as a promising paradigm in autonomous driving (AD). However, current performance evaluation protocols for VLM-based AD systems (ADVLMs) are predominantly confined to open-loop settings with static inputs, neglecting the more realistic and informative closed-loop setting that captures interactive behavior, feedback resilience, and real-world safety. To address this, we introduce Bench2ADVLM, a unified hierarchical closed-loop evaluation framework for real-time, interactive assessment of ADVLMs across both simulation and physical platforms. Inspired by dual-process theories of cognition, we first adapt diverse ADVLMs to simulation environments via a dual-system adaptation architecture. In this design, heterogeneous high-level driving commands generated by target ADVLMs (fast system) are interpreted by a general-purpose VLM (slow system)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety · Formal Methods in Verification