LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers

Fabian Schmidt; Karol Fedurko; Markus Enzweiler; Abhinav Valada

arXiv:2603.02035·cs.RO·March 3, 2026

LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers

Fabian Schmidt, Karol Fedurko, Markus Enzweiler, Abhinav Valada

PDF

Open Access

TL;DR

LAD-Drive introduces a novel generative framework that disentangles high-level intentions from low-level planning, using diffusion models to produce safe, kinematically feasible trajectories for autonomous driving, outperforming existing methods.

Contribution

It presents LAD-Drive, a new approach that explicitly models intention and uncertainty, improving trajectory generation for autonomous vehicles over prior unimodal and one-hot encoding methods.

Findings

01

Achieves up to 59% higher Driving Score on LangAuto benchmark.

02

Reduces route deviations and collisions significantly.

03

Outperforms competitive baselines in autonomous driving tasks.

Abstract

While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on unimodal planning heads that inherently limit their ability to represent multimodal driving behavior. Furthermore, most generative approaches frequently condition on one-hot encoded actions, discarding the nuanced navigational uncertainty critical for complex scenarios. To resolve these limitations, we introduce LAD-Drive, a generative framework that structurally disentangles high-level intention from low-level spatial planning. LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings. This distribution, fused with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics