VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving

Haichao Liu; Haoren Guo; Pei Liu; Benshan Ma; Yuxiang Zhang; Jun Ma; Tong Heng Lee

arXiv:2507.15266·cs.RO·July 22, 2025

VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving

Haichao Liu, Haoren Guo, Pei Liu, Benshan Ma, Yuxiang Zhang, Jun Ma, Tong Heng Lee

PDF

TL;DR

This paper introduces VLM-UDMC, a novel framework that enhances urban autonomous driving decision-making and motion control by integrating vision-language models for scene understanding, risk assessment, and real-time trajectory prediction, verified through simulations and real-world tests.

Contribution

The paper presents a new VLM-augmented decision-making framework that combines scene reasoning, risk awareness, and trajectory prediction for urban autonomous driving.

Findings

01

Improved driving performance in urban scenarios.

02

Effective integration of multimodal scene understanding.

03

Validated results through simulations and real-world experiments.

Abstract

Scene understanding and risk-aware attentions are crucial for human drivers to make safe and effective driving decisions. To imitate this cognitive ability in urban autonomous driving while ensuring the transparency and interpretability, we propose a vision-language model (VLM)-enhanced unified decision-making and motion control framework, named VLM-UDMC. This framework incorporates scene reasoning and risk-aware insights into an upper-level slow system, which dynamically reconfigures the optimal motion planning for the downstream fast system. The reconfiguration is based on real-time environmental changes, which are encoded through context-aware potential functions. More specifically, the upper-level slow system employs a two-step reasoning policy with Retrieval-Augmented Generation (RAG), leveraging foundation models to process multimodal inputs and retrieve contextual knowledge,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.