ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving

Yunsheng Ma; Burhaneddin Yaman; Xin Ye; Mahmut Yurt; Jingru Luo; Abhirup Mallik; Ziran Wang; Liu Ren

arXiv:2505.15158·cs.CV·May 22, 2025

ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving

Yunsheng Ma, Burhaneddin Yaman, Xin Ye, Mahmut Yurt, Jingru Luo, Abhirup Mallik, Ziran Wang, Liu Ren

PDF

Open Access

TL;DR

ALN-P3 introduces a unified framework that aligns vision and language modules across perception, prediction, and planning in autonomous driving, improving performance and interpretability without extra inference costs.

Contribution

It proposes a novel co-distillation framework with three alignment mechanisms that unify vision and language reasoning in autonomous driving systems.

Findings

01

Achieves state-of-the-art results on four benchmarks.

02

Significantly improves driving decision accuracy.

03

Enhances language reasoning capabilities in autonomous systems.

Abstract

Recent advances have explored integrating large language models (LLMs) into end-to-end autonomous driving systems to enhance generalization and interpretability. However, most existing approaches are limited to either driving performance or vision-language reasoning, making it difficult to achieve both simultaneously. In this paper, we propose ALN-P3, a unified co-distillation framework that introduces cross-modal alignment between "fast" vision-based autonomous driving systems and "slow" language-driven reasoning modules. ALN-P3 incorporates three novel alignment mechanisms: Perception Alignment (P1A), Prediction Alignment (P2A), and Planning Alignment (P3A), which explicitly align visual tokens with corresponding linguistic outputs across the full perception, prediction, and planning stack. All alignment modules are applied only during training and incur no additional costs during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN