Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Haohan Chi; Huan-ang Gao; Ziming Liu; Jianing Liu; Chenyu Liu; Jinwei Li; Kaisen Yang; Yangcheng Yu; Zeda Wang; Wenyi Li; Leichen Wang; Xingtao Hu; Hao Sun; Hang Zhao; Hao Zhao

arXiv:2505.23757·cs.CV·May 30, 2025

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Haohan Chi, Huan-ang Gao, Ziming Liu, Jianing Liu, Chenyu Liu, Jinwei Li, Kaisen Yang, Yangcheng Yu, Zeda Wang, Wenyi Li, Leichen Wang, Xingtao Hu, Hao Sun, Hang Zhao, Hao Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

Impromptu VLA introduces a large, curated dataset with rich annotations for vision-language-action models in autonomous driving, significantly improving performance in unstructured scenarios and providing diagnostic tools.

Contribution

The paper presents the Impromptu VLA Dataset with over 80,000 video clips, novel taxonomy, and annotations, enhancing VLA model training and evaluation in complex driving environments.

Findings

01

Performance gains on established benchmarks

02

Improved collision rates and trajectory prediction accuracy

03

Effective diagnostic via Q&A suite

Abstract

Vision-Language-Action (VLA) models for autonomous driving show promise but falter in unstructured corner case scenarios, largely due to a scarcity of targeted benchmarks. To address this, we introduce Impromptu VLA. Our core contribution is the Impromptu VLA Dataset: over 80,000 meticulously curated video clips, distilled from over 2M source clips sourced from 8 open-source large-scale datasets. This dataset is built upon our novel taxonomy of four challenging unstructured categories and features rich, planning-oriented question-answering annotations and action trajectories. Crucially, experiments demonstrate that VLAs trained with our dataset achieve substantial performance gains on established benchmarks--improving closed-loop NeuroNCAP scores and collision rates, and reaching near state-of-the-art L2 accuracy in open-loop nuScenes trajectory prediction. Furthermore, our Q&A suite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahydchh/impromptu-vla
pytorchOfficial

Videos

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications