ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
Duo Li, Zuhao Yang, Xiaoqin Zhang, Ling Shao, Shijian Lu

TL;DR
ToDRE introduces a novel, training-free visual token pruning method that enhances efficiency by separately considering token diversity and task relevance, achieving significant speed-ups with minimal performance loss.
Contribution
The paper proposes a two-stage, training-free framework that effectively prunes visual tokens by leveraging token diversity and task relevance, improving inference speed in large vision-language models.
Findings
Prunes 90% of visual tokens after the vision encoder.
Achieves 2.6x inference speed-up while maintaining 95% of model performance.
Effectively prunes all visual tokens in certain LLM decoder layers.
Abstract
Visual token pruning aims to compress and prune redundant visual tokens which play a critical role in efficient inference with large vision-language models (LVLMs). However, most existing work estimates visual redundancy using a single metric, such as cross-modal attention or visual token similarity. We show that visual token diversity and task-specific token relevance are two crucial yet orthogonal factors that complement each other in conveying useful information and should therefore be treated separately for more effective visual token pruning. Building upon this insight, we design TODRE, a two-stage and training-free framework that incorporates Token Diversity and task RElevance for effective token compression and efficient LVLM inference. Instead of pruning redundant tokens, we introduce a greedy max-sum diversification algorithm that selects and retains a subset of diverse and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Pruning
