LADY: Linear Attention for Autonomous Driving Efficiency without Transformers
Jihao Huang, Xi Xia, Zhiyuan Li, Tianle Liu, Jingke Wang, Junbo Chen, Tengju Ye

TL;DR
LADY introduces a fully linear attention-based model for autonomous driving that efficiently fuses long-range temporal and cross-modal information, achieving state-of-the-art results with constant computational costs suitable for edge deployment.
Contribution
It is the first to develop a linear attention model supporting cross-modal and cross-temporal interactions for autonomous driving.
Findings
Achieves state-of-the-art performance on NAVSIM and Bench2Drive benchmarks.
Maintains constant computational and memory costs regardless of sequence length.
Successfully deployed on resource-limited edge devices.
Abstract
End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a quadratic attention cost, limiting their ability to model long spatial and temporal sequences-particularly on resource-constrained edge platforms. As autonomous driving inherently demands efficient temporal modeling, this challenge severely limits their deployment and real-time performance. Recently, linear attention mechanisms have gained increasing attention due to their superior spatiotemporal complexity. However, existing linear attention architectures are limited to self-attention, lacking support for cross-modal and cross-temporal interactions-both crucial for autonomous driving. In this work, we propose LADY, the first fully linear attention-based generative model for end-to-end autonomous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Multimodal Machine Learning Applications
