ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference

Zhaohong Huang; Wenjing Liu; Yuxin Zhang; Fei Chao; Rongrong Ji

arXiv:2604.05601·cs.CV·April 8, 2026

ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference

Zhaohong Huang, Wenjing Liu, Yuxin Zhang, Fei Chao, Rongrong Ji

PDF

TL;DR

ID-Selection is a token selection method for LVLMs that combines importance estimation with diversity-aware iterative selection, significantly reducing tokens and computation while maintaining performance.

Contribution

It introduces a unified importance-diversity token selection strategy that outperforms existing methods, especially at high pruning ratios, without extra training.

Findings

01

Prunes 97.2% of tokens on LLaVA-1.5-7B, reducing FLOPs by over 97%.

02

Retains 91.8% of original performance with only 16 tokens.

03

Consistently improves efficiency across 5 LVLM backbones and 16 benchmarks.

Abstract

Recent advances have explored visual token pruning to accelerate the inference of large vision-language models (LVLMs). However, existing methods often struggle to balance token importance and diversity: importance-based methods tend to retain redundant tokens, whereas diversity-based methods may overlook informative ones. This trade-off becomes especially problematic under high reduction ratios, where preserving only a small subset of visual tokens is critical. To address this issue, we propose ID-Selection, a simple yet effective token selection strategy for efficient LVLM inference. The key idea is to couple importance estimation with diversity-aware iterative selection: each token is first assigned an importance score, after which high-scoring tokens are selected one by one while the scores of similar tokens are progressively suppressed. In this way, ID-Selection preserves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.