ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance

Duo Li; Zuhao Yang; Xiaoqin Zhang; Ling Shao; Shijian Lu

arXiv:2505.18757·cs.CV·November 20, 2025

ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance

Duo Li, Zuhao Yang, Xiaoqin Zhang, Ling Shao, Shijian Lu

PDF

TL;DR

ToDRE introduces a novel, training-free visual token pruning method that enhances efficiency by separately considering token diversity and task relevance, achieving significant speed-ups with minimal performance loss.

Contribution

The paper proposes a two-stage, training-free framework that effectively prunes visual tokens by leveraging token diversity and task relevance, improving inference speed in large vision-language models.

Findings

01

Prunes 90% of visual tokens after the vision encoder.

02

Achieves 2.6x inference speed-up while maintaining 95% of model performance.

03

Effectively prunes all visual tokens in certain LLM decoder layers.

Abstract

Visual token pruning aims to compress and prune redundant visual tokens which play a critical role in efficient inference with large vision-language models (LVLMs). However, most existing work estimates visual redundancy using a single metric, such as cross-modal attention or visual token similarity. We show that visual token diversity and task-specific token relevance are two crucial yet orthogonal factors that complement each other in conveying useful information and should therefore be treated separately for more effective visual token pruning. Building upon this insight, we design TODRE, a two-stage and training-free framework that incorporates Token Diversity and task RElevance for effective token compression and efficient LVLM inference. Instead of pruning redundant tokens, we introduce a greedy max-sum diversification algorithm that selects and retains a subset of diverse and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Pruning