FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning

Jiajun Cao; Qizhe Zhang; Peidong Jia; Xuhui Zhao; Bo Lan; Xiaoan Zhang; Zhuo Li; Xiaobao Wei; Sixiang Chen; Liyun Li; Xianming Liu; Ming Lu; Yang Wang; Shanghang Zhang

arXiv:2507.23318·cs.CV·November 17, 2025

FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning

Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Zhuo Li, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, Ming Lu, Yang Wang, Shanghang Zhang

PDF

Open Access

TL;DR

FastDriveVLA introduces a reconstruction-based token pruning method for autonomous driving that efficiently retains foreground information, significantly reducing computational costs while maintaining high performance in scene understanding and decision-making.

Contribution

The paper presents a novel plug-and-play reconstruction-based token pruner, ReconPruner, trained with a new adversarial foreground-background strategy and a large-scale dataset, nuScenes-FG, for efficient autonomous driving models.

Findings

01

Achieves state-of-the-art results on nuScenes planning benchmark.

02

Effectively retains foreground information with high pruning ratios.

03

Seamless application to different VLA models without retraining.

Abstract

Vision-Language-Action (VLA) models have demonstrated significant potential in complex scene understanding and action reasoning, leading to their increasing adoption in end-to-end autonomous driving systems. However, the long visual tokens of VLA models greatly increase computational costs. Current visual token pruning methods in Vision-Language Models (VLM) rely on either visual token similarity or visual-text attention, but both have shown poor performance in autonomous driving scenarios. Given that human drivers concentrate on relevant foreground areas while driving, we assert that retaining visual tokens containing this foreground information is essential for effective decision-making. Inspired by this, we propose FastDriveVLA, a novel reconstruction-based vision token pruning framework designed specifically for autonomous driving. FastDriveVLA includes a plug-and-play visual token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications