VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

Zhenkai Wu; Xiaowen Ma; Zhenliang Ni; Dengming Zhang; Han Shu; Xin Jiang; Xinghao Chen

arXiv:2512.02700·cs.CV·February 27, 2026

VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

Zhenkai Wu, Xiaowen Ma, Zhenliang Ni, Dengming Zhang, Han Shu, Xin Jiang, Xinghao Chen

PDF

Open Access

TL;DR

VLM-Pruner is a novel, training-free token pruning method for vision-language models that balances redundancy and spatial sparsity, improving efficiency while preserving important visual details.

Contribution

It introduces a centrifugal token pruning paradigm with buffering for spatial sparsity, enabling effective pruning without training and better object region coverage.

Findings

01

Outperforms strong baselines across five VLMs

02

Achieves 88.9% pruning rate with significant speedup

03

Maintains high accuracy despite aggressive pruning

Abstract

Vision-language models (VLMs) excel at image understanding tasks, but the large number of visual tokens imposes significant computational costs, hindering deployment on mobile devices. Many pruning methods rely solely on token importance and thus overlook inter-token redundancy, retaining numerous duplicated tokens and wasting capacity. Although some redundancy-aware approaches have been proposed, they often ignore the spatial relationships among visual tokens. This can lead to overly sparse selections of retained tokens that fail to adequately cover the regions of target objects. To address these limitations, we propose VLM-Pruner, a training-free token pruning algorithm that explicitly balances redundancy and spatial sparsity. We introduce a centrifugal token pruning paradigm that enables near-to-far selection while prioritizing the preservation of fine-grained object details.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis