LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers
Fabian Schmidt, Karol Fedurko, Markus Enzweiler, Abhinav Valada

TL;DR
LAD-Drive introduces a novel generative framework that disentangles high-level intentions from low-level planning, using diffusion models to produce safe, kinematically feasible trajectories for autonomous driving, outperforming existing methods.
Contribution
It presents LAD-Drive, a new approach that explicitly models intention and uncertainty, improving trajectory generation for autonomous vehicles over prior unimodal and one-hot encoding methods.
Findings
Achieves up to 59% higher Driving Score on LangAuto benchmark.
Reduces route deviations and collisions significantly.
Outperforms competitive baselines in autonomous driving tasks.
Abstract
While multimodal large language models (MLLMs) provide advanced reasoning for autonomous driving, translating their discrete semantic knowledge into continuous trajectories remains a fundamental challenge. Existing methods often rely on unimodal planning heads that inherently limit their ability to represent multimodal driving behavior. Furthermore, most generative approaches frequently condition on one-hot encoded actions, discarding the nuanced navigational uncertainty critical for complex scenarios. To resolve these limitations, we introduce LAD-Drive, a generative framework that structurally disentangles high-level intention from low-level spatial planning. LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings. This distribution, fused with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
