AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Zewei Zhou; Tianhui Cai; Seth Z. Zhao; Yun Zhang; Zhiyu Huang; Bolei Zhou; Jiaqi Ma

arXiv:2506.13757·cs.CV·November 7, 2025

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

Zewei Zhou, Tianhui Cai, Seth Z. Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, Jiaqi Ma

PDF

Open Access 1 Repo

TL;DR

AutoVLA is an innovative end-to-end autonomous driving model that integrates reasoning and action generation using a unified autoregressive approach, employing adaptive reasoning and reinforcement fine-tuning for improved performance.

Contribution

The paper introduces AutoVLA, a novel vision-language-action model that unifies reasoning and action generation, and incorporates reinforcement fine-tuning for adaptive, efficient autonomous driving.

Findings

01

AutoVLA achieves competitive results on nuPlan, nuScenes, Waymo, and CARLA datasets.

02

The model demonstrates effective adaptive reasoning in diverse driving scenarios.

03

Reinforcement fine-tuning reduces unnecessary reasoning in simple cases.

Abstract

Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities. However, current VLA models often struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning. In this paper, we propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model for end-to-end autonomous driving. AutoVLA performs semantic reasoning and trajectory planning directly from raw visual inputs and language instructions. We tokenize continuous trajectories into discrete, feasible actions, enabling direct integration into the language model. For training, we employ supervised fine-tuning to equip the model with dual thinking modes: fast thinking (trajectory-only) and slow thinking (enhanced with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucla-mobility/AutoVLA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics

MethodsProximal Policy Optimization · CARLA: An Open Urban Driving Simulator