DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving
Haibo HU, Lianming Huang, Nan Guan, Chun Jason Xue

TL;DR
DeeAD is a training-free early-exit framework for vision-language action models in autonomous driving, significantly reducing inference latency by evaluating trajectory feasibility and adaptively skipping redundant layers.
Contribution
DeeAD introduces a novel, training-free early-exit method with a multi-hop controller that enhances efficiency of VLA models without retraining.
Findings
Achieves up to 28% transformer-layer sparsity
Reduces latency by 29%
Maintains planning quality and safety
Abstract
Vision-Language Action (VLA) models unify perception, reasoning, and trajectory generation for autonomous driving, but suffer from significant inference latency due to deep transformer stacks. We present DeeAD, a training-free, action-guided early-exit framework that accelerates VLA planning by evaluating the physical feasibility of intermediate trajectories. Instead of relying on confidence scores, DeeAD terminates inference when predicted trajectories align with lightweight planning priors (e.g., Navigation or Low-precision Planning) within a tolerable deviation (<2m). To improve efficiency, we introduce a multi-hop controller that adaptively skips redundant layers based on the change rate of scores. DeeAD integrates into existing VLA models, such as ORION, without requiring retraining. Experiments on the Bench2Drive benchmark demonstrate up to 28% transformer-layer sparsity and 29%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms
