History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation
Qitong Wang, Yijun Liang, Ming Li, Tianyi Zhou, Christopher Rasmussen

TL;DR
This paper introduces a training-free spatio-temporal token pruning method for vision-language navigation that reduces computational costs and maintains high accuracy, enabling real-time robotic navigation without retraining models.
Contribution
It proposes a novel plug-and-play token pruning framework that enhances efficiency in VLA-based VLN systems without requiring retraining or model modification.
Findings
Significantly outperforms existing pruning strategies.
Preserves navigation accuracy under extreme pruning.
Enables low-latency real-world robotic navigation.
Abstract
Vision-Language Navigation (VLN) enables robots to follow natural-language instructions in visually grounded environments, serving as a key capability for embodied robotic systems. Recent Vision-Language-Action (VLA) models have demonstrated strong navigation performance, but their high computational cost introduces latency that limits real-time deployment. We propose a training-free spatio-temporal vision token pruning framework tailored to VLA-based VLN. We apply spatial token selection to the current view, alongside spatio-temporal compression for historical memories, enabling efficient long-horizon inference while reducing redundant computation. Leveraging attention-based token importance and query-guided spatio-temporal filtering, the proposed approach preserves navigation-relevant information without retraining or modifying pretrained models, allowing plug-and-play integration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Advanced Neural Network Applications
