FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning
Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Zhuo Li, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, Ming Lu, Yang Wang, Shanghang Zhang

TL;DR
FastDriveVLA introduces a reconstruction-based token pruning method for autonomous driving that efficiently retains foreground information, significantly reducing computational costs while maintaining high performance in scene understanding and decision-making.
Contribution
The paper presents a novel plug-and-play reconstruction-based token pruner, ReconPruner, trained with a new adversarial foreground-background strategy and a large-scale dataset, nuScenes-FG, for efficient autonomous driving models.
Findings
Achieves state-of-the-art results on nuScenes planning benchmark.
Effectively retains foreground information with high pruning ratios.
Seamless application to different VLA models without retraining.
Abstract
Vision-Language-Action (VLA) models have demonstrated significant potential in complex scene understanding and action reasoning, leading to their increasing adoption in end-to-end autonomous driving systems. However, the long visual tokens of VLA models greatly increase computational costs. Current visual token pruning methods in Vision-Language Models (VLM) rely on either visual token similarity or visual-text attention, but both have shown poor performance in autonomous driving scenarios. Given that human drivers concentrate on relevant foreground areas while driving, we assert that retaining visual tokens containing this foreground information is essential for effective decision-making. Inspired by this, we propose FastDriveVLA, a novel reconstruction-based vision token pruning framework designed specifically for autonomous driving. FastDriveVLA includes a plug-and-play visual token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
