Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

Jialuo He; Huangxun Chen

arXiv:2603.05950·cs.CV·March 9, 2026

Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

Jialuo He, Huangxun Chen

PDF

Open Access

TL;DR

E-AdaPrune is an energy-driven adaptive visual token pruning method that allocates tokens based on image information density, improving efficiency and performance in vision-language models without extra learnable parameters.

Contribution

The paper introduces E-AdaPrune, a novel spectral energy-based adaptive pruning framework that dynamically allocates visual tokens without additional parameters, enhancing model efficiency.

Findings

01

Achieves up to 0.6% average accuracy improvement across benchmarks.

02

Significantly boosts reasoning task performance by 5.1%.

03

Maintains low latency of 8ms per image with randomized SVD.

Abstract

Visual token reduction is critical for accelerating Vision-Language Models (VLMs), yet most existing approaches rely on a fixed budget shared across all inputs, overlooking the substantial variation in image information density. We propose E-AdaPrune, an energy-driven adaptive pruning framework that determines the token budget from the singular value spectrum of the visual features space. By preserving a certain proportion of spectral energy, our method allocates more tokens to information-dense scenes while aggressively compressing redundant ones, without introducing additional learnable parameters. We evaluate E-AdaPrune on nine benchmarks and three VLM backbones, LLaVA-1.5-7B, LLaVA-1.5-13B, and LLaVA-NeXT-8B. Under matched average token budgets, E-AdaPrune consistently yields an average improvement of up to 0.6\%, including a significant +5.1\% relative boost on the MMVet reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis