DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Yu Gao, Anqing Jiang, Yiru Wang, Wang Jijun, Hao Jiang, Zhigang Sun, Heng Yuwen, Wang Shuo, Hao Zhao, Sun Hao

TL;DR
DiffVLA++ is a novel autonomous driving framework that combines cognitive reasoning and end-to-end planning through metric-guided alignment, improving generalization and physical feasibility in complex scenarios.
Contribution
It introduces a VLA module for semantic trajectories, an E2E module with a dense trajectory vocabulary, and a metric-guided scorer to align their outputs, enhancing autonomous driving performance.
Findings
Achieves an EPDMS of 49.12 on the ICCV 2025 Autonomous Grand Challenge leaderboard.
Effectively combines world knowledge and physical reasoning for robust driving.
Demonstrates improved handling of long-tail and challenging scenarios.
Abstract
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms
