A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving
Yupeng Zhou, Can Cui, Juntong Peng, Zichong Yang, Juanwu Lu, Jitesh H Panchal, Bin Yao, Ziran Wang

TL;DR
This paper presents a hierarchical real-world testing platform for evaluating vision-language model-integrated autonomous vehicles, addressing domain shift challenges and enabling flexible, authentic scenario testing in real-world conditions.
Contribution
It introduces a modular, low-latency middleware and a configurable testing environment for comprehensive evaluation of VLM-based autonomous driving systems.
Findings
Effective testing of VLM-enabled vehicles demonstrated
Supports diverse real-world scenarios and conditions
Facilitates robust experimentation and evaluation
Abstract
Vision-Language Models (VLMs) have demonstrated notable promise in autonomous driving by offering the potential for multimodal reasoning through pretraining on extensive image-text pairs. However, adapting these models from broad web-scale data to the safety-critical context of driving presents a significant challenge, commonly referred to as domain shift. Existing simulation-based and dataset-driven evaluation methods, although valuable, often fail to capture the full complexity of real-world scenarios and cannot easily accommodate repeatable closed-loop testing with flexible scenario manipulation. In this paper, we introduce a hierarchical real-world test platform specifically designed to evaluate VLM-integrated autonomous driving systems. Our approach includes a modular, low-latency on-vehicle middleware that allows seamless incorporation of various VLMs, a clearly separated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Semantic Web and Ontologies · Robotics and Sensor-Based Localization
