SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving

Peiru Zheng; Yun Zhao; Zhan Gong; Hong Zhu; Shaohua Wu

arXiv:2510.17191·cs.RO·October 29, 2025

SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving

Peiru Zheng, Yun Zhao, Zhan Gong, Hong Zhu, Shaohua Wu

PDF

TL;DR

SimpleVSF is a novel framework that improves end-to-end autonomous driving decision-making by integrating vision-language models and advanced trajectory fusion, achieving state-of-the-art results in complex scenarios.

Contribution

The paper introduces SimpleVSF, a new approach combining VLM-enhanced scoring and trajectory fusion techniques for better autonomous driving decisions.

Findings

01

Achieves state-of-the-art performance in ICCV 2025 NAVSIM v2 challenge.

02

Balances safety, comfort, and efficiency effectively.

03

Demonstrates robustness in complex driving scenarios.

Abstract

End-to-end autonomous driving has emerged as a promising paradigm for achieving robust and intelligent driving policies. However, existing end-to-end methods still face significant challenges, such as suboptimal decision-making in complex scenarios. In this paper,we propose SimpleVSF (Simple VLM-Scoring Fusion), a novel framework that enhances end-to-end planning by leveraging the cognitive capabilities of Vision-Language Models (VLMs) and advanced trajectory fusion techniques. We utilize the conventional scorers and the novel VLM-enhanced scorers. And we leverage a robust weight fusioner for quantitative aggregation and a powerful VLM-based fusioner for qualitative, context-aware decision-making. As the leading approach in the ICCV 2025 NAVSIM v2 End-to-End Driving Challenge, our SimpleVSF framework demonstrates state-of-the-art performance, achieving a superior balance between safety,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.