DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving

Chenxu Dang; Sining Ang; Yongkang Li; Haochen Tian; Jie Wang; Guang Li; Hangjun Ye; Jie Ma; Long Chen; Yan Wang

arXiv:2602.14577·cs.CV·February 17, 2026

DriveFine: Refining-Augmented Masked Diffusion VLA for Precise and Robust Driving

Chenxu Dang, Sining Ang, Yongkang Li, Haochen Tian, Jie Wang, Guang Li, Hangjun Ye, Jie Ma, Long Chen, Yan Wang

PDF

Open Access

TL;DR

DriveFine introduces a novel masked diffusion model with a plug-and-play expert refinement mechanism for autonomous driving, improving robustness and flexibility over existing VLA models.

Contribution

The paper presents DriveFine, a diffusion-based VLA model with a decoupled expert refinement component and a hybrid reinforcement learning strategy, enhancing performance and generalization.

Findings

01

Outperforms existing models on NAVSIM v1, v2, and Navhard benchmarks.

02

Demonstrates strong robustness and efficacy in autonomous driving tasks.

03

Shows effective self-correction and flexible decoding capabilities.

Abstract

Vision-Language-Action (VLA) models for autonomous driving increasingly adopt generative planners trained with imitation learning followed by reinforcement learning. Diffusion-based planners suffer from modality alignment difficulties, low training efficiency, and limited generalization. Token-based planners are plagued by cumulative causal errors and irreversible decoding. In summary, the two dominant paradigms exhibit complementary strengths and weaknesses. In this paper, we propose DriveFine, a masked diffusion VLA model that combines flexible decoding with self-correction capabilities. In particular, we design a novel plug-and-play block-MoE, which seamlessly injects a refinement expert on top of the generation expert. By enabling explicit expert selection during inference and gradient blocking during training, the two experts are fully decoupled, preserving the foundational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications