Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

Dapeng Zhang; Zhenlong Yuan; Zhangquan Chen; Chih-Ting Liao; Yinda Chen; Fei Shen; Qingguo Zhou; Tat-Seng Chua

arXiv:2511.19912·cs.CV·November 26, 2025

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

Dapeng Zhang, Zhenlong Yuan, Zhangquan Chen, Chih-Ting Liao, Yinda Chen, Fei Shen, Qingguo Zhou, Tat-Seng Chua

PDF

Open Access

TL;DR

Reasoning-VLA is a fast, general vision-language-action model for autonomous driving that improves inference speed and generalization across diverse scenarios using a novel reasoning-based framework and learnable action queries.

Contribution

The paper introduces Reasoning-VLA, a novel framework with learnable action queries and a standardized dataset format, enhancing efficiency and generalization in autonomous driving decision-making.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Demonstrates superior generalization to new driving scenarios.

03

Provides fast inference suitable for real-time autonomous driving.

Abstract

Vision-Language-Action (VLA) models have recently shown strong decision-making capabilities in autonomous driving. However, existing VLAs often struggle with achieving efficient inference and generalizing to novel autonomous vehicle configurations and driving scenarios. In this paper, we propose Reasoning-VLA, a general and fast action-generation VLA framework. The proposed model employs a set of learnable action queries, initialized via Gaussian sampling from ground-truth trajectories within the training corpus. These learnable queries interact with reasoning-enhanced vision-language features to generate continuous action trajectories in parallel. To promote robust generalization, we consolidate eight publicly available autonomous driving datasets into a standardized, Chain-of-Thought reasoning-based, and easy-to-use data format for model training. Leveraging both supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications