DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment

Yu Gao; Anqing Jiang; Yiru Wang; Wang Jijun; Hao Jiang; Zhigang Sun; Heng Yuwen; Wang Shuo; Hao Zhao; Sun Hao

arXiv:2510.17148·cs.RO·November 5, 2025

DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment

Yu Gao, Anqing Jiang, Yiru Wang, Wang Jijun, Hao Jiang, Zhigang Sun, Heng Yuwen, Wang Shuo, Hao Zhao, Sun Hao

PDF

Open Access

TL;DR

DiffVLA++ is a novel autonomous driving framework that combines cognitive reasoning and end-to-end planning through metric-guided alignment, improving generalization and physical feasibility in complex scenarios.

Contribution

It introduces a VLA module for semantic trajectories, an E2E module with a dense trajectory vocabulary, and a metric-guided scorer to align their outputs, enhancing autonomous driving performance.

Findings

01

Achieves an EPDMS of 49.12 on the ICCV 2025 Autonomous Grand Challenge leaderboard.

02

Effectively combines world knowledge and physical reasoning for robust driving.

03

Demonstrates improved handling of long-tail and challenging scenarios.

Abstract

Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms