VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference

Ziyan Liu; Yeqiu Chen; Hongyi Cai; Tao Lin; Shuo Yang; Zheng Liu; Bo Zhao

arXiv:2511.16449·cs.CV·February 11, 2026

VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference

Ziyan Liu, Yeqiu Chen, Hongyi Cai, Tao Lin, Shuo Yang, Zheng Liu, Bo Zhao

PDF

Open Access

TL;DR

VLA-Pruner introduces a dual-level, temporal-aware token pruning method tailored for vision-language-action models, significantly improving efficiency while maintaining performance in robotic tasks by balancing semantic and action-related visual information.

Contribution

It proposes a novel dual-level importance criterion and token selection strategy that aligns with VLA models' dual-system nature, enhancing real-time robotic inference.

Findings

01

Achieves state-of-the-art performance across multiple VLA architectures.

02

Effectively balances semantic understanding and action execution.

03

Reduces computational cost while maintaining accuracy.

Abstract

Vision-Language-Action (VLA) models have shown great promise for embodied AI, yet the heavy computational cost of processing continuous visual streams severely limits their real-time deployment. Token pruning (keeping salient visual tokens and dropping redundant ones) has emerged as an effective approach for accelerating Vision-Language Models (VLMs), offering a solution for efficient VLA. However, these VLM-specific token pruning methods select tokens based solely on semantic salience metrics (e.g., prefill attention), while overlooking the VLA's intrinsic dual-system nature of high-level semantic understanding and low-level action execution. Consequently, these methods bias token retention toward semantic cues, discard critical information for action generation, and significantly degrade VLA performance. To bridge this gap, we propose VLA-Pruner, a versatile plug-and-play VLA-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning