MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Haoyu Fu; Diankun Zhang; Zongchuang Zhao; Jianfeng Cui; Hongwei Xie; Bing Wang; Guang Chen; Dingkang Liang; Xiang Bai

arXiv:2512.13636·cs.CV·February 6, 2026

MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Hongwei Xie, Bing Wang, Guang Chen, Dingkang Liang, Xiang Bai

PDF

Open Access

TL;DR

MindDrive introduces a novel vision-language-action framework for autonomous driving that leverages online reinforcement learning with a large language model to improve decision-making and exploration in complex scenarios.

Contribution

It is the first to apply online reinforcement learning to a vision-language-action model in autonomous driving, using a dual-LLM approach for decision-making and trajectory mapping.

Findings

01

Achieves a Driving Score of 78.04 on Bench2Drive

02

Attains a Success Rate of 55.09% on Bench2Drive

03

Demonstrates effective trial-and-error learning in complex driving scenarios

Abstract

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety