EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation
Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Zijian Wang, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, Shuchang Zhou, Yang Wang, Shanghang Zhang

TL;DR
EvoDriveVLA introduces a collaborative distillation framework for vision-language-action models in autonomous driving, improving perception and planning stability through self-anchored and future-informed trajectory distillation.
Contribution
It presents a novel distillation approach combining perceptual constraints and trajectory reasoning to enhance VLA model performance in autonomous driving.
Findings
Achieves state-of-the-art results on nuScenes dataset.
Significantly improves closed-loop evaluation performance.
Effectively models future trajectory evolutions.
Abstract
Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and future-informed trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, future-informed trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to synthesize reasoning trajectories that model future evolutions, enabling the student model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
