A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving

Yupeng Zhou; Can Cui; Juntong Peng; Zichong Yang; Juanwu Lu; Jitesh H Panchal; Bin Yao; Ziran Wang

arXiv:2506.14100·cs.RO·June 18, 2025

A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving

Yupeng Zhou, Can Cui, Juntong Peng, Zichong Yang, Juanwu Lu, Jitesh H Panchal, Bin Yao, Ziran Wang

PDF

Open Access

TL;DR

This paper presents a hierarchical real-world testing platform for evaluating vision-language model-integrated autonomous vehicles, addressing domain shift challenges and enabling flexible, authentic scenario testing in real-world conditions.

Contribution

It introduces a modular, low-latency middleware and a configurable testing environment for comprehensive evaluation of VLM-based autonomous driving systems.

Findings

01

Effective testing of VLM-enabled vehicles demonstrated

02

Supports diverse real-world scenarios and conditions

03

Facilitates robust experimentation and evaluation

Abstract

Vision-Language Models (VLMs) have demonstrated notable promise in autonomous driving by offering the potential for multimodal reasoning through pretraining on extensive image-text pairs. However, adapting these models from broad web-scale data to the safety-critical context of driving presents a significant challenge, commonly referred to as domain shift. Existing simulation-based and dataset-driven evaluation methods, although valuable, often fail to capture the full complexity of real-world scenarios and cannot easily accommodate repeatable closed-loop testing with flexible scenario manipulation. In this paper, we introduce a hierarchical real-world test platform specifically designed to evaluate VLM-integrated autonomous driving systems. Our approach includes a modular, low-latency on-vehicle middleware that allows seamless incorporation of various VLMs, a clearly separated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Semantic Web and Ontologies · Robotics and Sensor-Based Localization