DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving

Dawood Wasif; Terrence J. Moore; Chandan K. Reddy; Frederica Free-Nelson; Seunghyun Yoon; Hyuk Lim; Dan Dongseong Kim; Jin-Hee Cho

arXiv:2506.00819·cs.RO·March 16, 2026

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving

Dawood Wasif, Terrence J. Moore, Chandan K. Reddy, Frederica Free-Nelson, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho

PDF

Open Access

TL;DR

DriveMind introduces a novel reinforcement learning framework for autonomous driving that combines semantic understanding, dynamic prompt generation, safety constraints, and world modeling to improve performance and generalization in complex environments.

Contribution

The paper presents DriveMind, a unified semantic reward framework integrating dual vision-language models, dynamic prompt generation, safety modules, and world models for enhanced autonomous driving.

Findings

01

Achieves 19.4 km/h average speed and 0.98 route completion in CARLA.

02

Outperforms baselines by over 4% success rate.

03

Generalizes zero-shot to real dash-cam data with minimal shift.

Abstract

End-to-end autonomous driving systems map sensor data directly to control commands, but remain opaque, lack interpretability, and offer no formal safety guarantees. While recent vision-language-guided reinforcement learning (RL) methods introduce semantic feedback, they often rely on static prompts and fixed objectives, limiting adaptability to dynamic driving scenes. We present DriveMind, a unified semantic reward framework that integrates: (i) a contrastive Vision-Language Model (VLM) encoder for stepwise semantic anchoring; (ii) a novelty-triggered VLM encoder-decoder, fine-tuned via chain-of-thought (CoT) distillation, for dynamic prompt generation upon semantic drift; (iii) a hierarchical safety module enforcing kinematic constraints (e.g., speed, lane centering, stability); and (iv) a compact predictive world model to reward alignment with anticipated ideal states. DriveMind…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Traffic control and management

MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator