DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving
Dawood Wasif, Terrence J. Moore, Chandan K. Reddy, Frederica Free-Nelson, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho

TL;DR
DriveMind introduces a novel reinforcement learning framework for autonomous driving that combines semantic understanding, dynamic prompt generation, safety constraints, and world modeling to improve performance and generalization in complex environments.
Contribution
The paper presents DriveMind, a unified semantic reward framework integrating dual vision-language models, dynamic prompt generation, safety modules, and world models for enhanced autonomous driving.
Findings
Achieves 19.4 km/h average speed and 0.98 route completion in CARLA.
Outperforms baselines by over 4% success rate.
Generalizes zero-shot to real dash-cam data with minimal shift.
Abstract
End-to-end autonomous driving systems map sensor data directly to control commands, but remain opaque, lack interpretability, and offer no formal safety guarantees. While recent vision-language-guided reinforcement learning (RL) methods introduce semantic feedback, they often rely on static prompts and fixed objectives, limiting adaptability to dynamic driving scenes. We present DriveMind, a unified semantic reward framework that integrates: (i) a contrastive Vision-Language Model (VLM) encoder for stepwise semantic anchoring; (ii) a novelty-triggered VLM encoder-decoder, fine-tuned via chain-of-thought (CoT) distillation, for dynamic prompt generation upon semantic drift; (iii) a hierarchical safety module enforcing kinematic constraints (e.g., speed, lane centering, stability); and (iv) a compact predictive world model to reward alignment with anticipated ideal states. DriveMind…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Traffic control and management
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
