Balancing Saliency and Coverage: Semantic Prominence-Aware Budgeting for Visual Token Compression in VLMs

Jaehoon Lee; Mingi Jung; Soohyuk Jang; Seungryong Yoo; Dahuin Jung; Sungroh Yoon

arXiv:2603.14892·cs.CV·March 17, 2026

Balancing Saliency and Coverage: Semantic Prominence-Aware Budgeting for Visual Token Compression in VLMs

Jaehoon Lee, Mingi Jung, Soohyuk Jang, Seungryong Yoo, Dahuin Jung, Sungroh Yoon

PDF

Open Access

TL;DR

This paper introduces PromPrune, an adaptive visual token selection method for VLMs that balances saliency and coverage based on sample-specific semantic prominence, significantly reducing computation while maintaining accuracy.

Contribution

The paper proposes a novel sample-adaptive token compression framework that dynamically balances local saliency and global coverage in VLMs, outperforming static methods.

Findings

01

Reduces FLOPs by 88% on LLaVA-NeXT-7B

02

Decreases prefill latency by 22%

03

Maintains 97.5% of original accuracy

Abstract

Large Vision-Language Models (VLMs) achieve strong multimodal understanding capabilities by leveraging high-resolution visual inputs, but the resulting large number of visual tokens creates a major computational bottleneck. Recent work mitigates this issue through visual token compression, typically compressing tokens based on saliency, diversity, or a fixed combination of both. We observe that the distribution of semantic prominence varies substantially across samples, leading to different optimal trade-offs between local saliency preservation and global coverage. This observation suggests that applying a static compression strategy across all samples can be suboptimal. Motivated by this insight, we propose PromPrune, a sample-adaptive visual token selection framework composed of semantic prominence-aware budget allocation and a two-stage selection pipeline. Our method adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning