DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Zhiyi Hou; Enhui Ma; Fang Li; Zhiyi Lai; Kalok Ho; Zhanqian Wu; Lijun Zhou; Long Chen; Chitian Sun; Haiyang Sun; Bing Wang; Guang Chen; Hangjun Ye; and Kaicheng Yu

arXiv:2507.02948·cs.CV·July 15, 2025

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Zhiyi Hou, Enhui Ma, Fang Li, Zhiyi Lai, Kalok Ho, Zhanqian Wu, Lijun Zhou, Long Chen, Chitian Sun, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, and Kaicheng Yu

PDF

Open Access

TL;DR

This paper introduces DriveMRP, a synthetic high-risk motion data generation method and a VLM-agnostic framework that significantly improves motion risk prediction accuracy in autonomous driving scenarios, demonstrating strong generalization in real-world tests.

Contribution

It presents a novel BEV-based motion simulation technique and a VLM-agnostic risk estimation framework that enhances vision-language models' ability to predict motion risks.

Findings

01

Accident recognition accuracy increased from 27.13% to 88.03% after fine-tuning.

02

Zero-shot evaluation accuracy improved from 29.42% to 68.50%.

03

Synthetic high-risk motion data effectively enhances model performance.

Abstract

Autonomous driving has seen significant progress, driven by extensive real-world data. However, in long-tail scenarios, accurately predicting the safety of the ego vehicle's future motion remains a major challenge due to uncertainties in dynamic environments and limitations in data coverage. In this work, we aim to explore whether it is possible to enhance the motion risk prediction capabilities of Vision-Language Models (VLM) by synthesizing high-risk motion data. Specifically, we introduce a Bird's-Eye View (BEV) based motion simulation method to model risks from three aspects: the ego-vehicle, other vehicles, and the environment. This allows us to synthesize plug-and-play, high-risk motion data suitable for VLM training, which we call DriveMRP-10K. Furthermore, we design a VLM-agnostic motion risk estimation framework, named DriveMRP-Agent. This framework incorporates a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications