Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models

Canyu Chen; Yuguang Yang; Zhewen Tan; Yizhi Wang; Ruiyi Zhan; Haiyan Liu; Xuanyao Mao; Jason Bao; Xinyue Tang; Linlin Yang; Bingchuan Sun; Yan Wang; Baochang Zhang

arXiv:2603.06049·cs.CV·March 9, 2026

Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models

Canyu Chen, Yuguang Yang, Zhewen Tan, Yizhi Wang, Ruiyi Zhan, Haiyan Liu, Xuanyao Mao, Jason Bao, Xinyue Tang, Linlin Yang, Bingchuan Sun, Yan Wang, Baochang Zhang

PDF

Open Access 1 Models

TL;DR

This paper introduces Curious-VLA, a two-stage framework that enhances exploration in autonomous vehicle learning models by generating diverse trajectories and prioritizing high-diversity samples, leading to state-of-the-art results.

Contribution

The paper proposes a novel two-stage approach with FTE and ADAS strategies to overcome narrow policy limitations in VLA models, improving exploration and driving performance.

Findings

01

Achieves state-of-the-art results on Navsim benchmark

02

Effectively enhances exploration in VLA models

03

Demonstrates significant performance improvements

Abstract

We identify a fundamental Narrow Policy limitation undermining the performance of autonomous VLA models, where driving Imitation Learning (IL) tends to collapse exploration and limit the potential of subsequent Reinforcement Learning (RL) stages, which often saturate prematurely due to insufficient feedback diversity. Thereby, we propose Curious-VLA, a framework that alleviates the exploit-explore dilemma through a two-stage design. During IL, we introduce a Feasible Trajectory Expansion (FTE) strategy to generate multiple physically valid trajectories and a step-wise normalized trajectory representation to adapt this diverse data. In the RL stage, we present Adaptive Diversity-Aware Sampling (ADAS) that prioritizes high-diversity samples and introduce Spanning Driving Reward (SDR) with a focal style weighting to amplify reward's value span for improving sensitivity to driving quality.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
MashiroLn/Curious-VLA
model· 72 dl· ♡ 2
72 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Domain Adaptation and Few-Shot Learning