AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Changwoo Baek; Jouwon Song; Sohyeon Kim; Kyeongbo Kong

arXiv:2603.01236·cs.CV·March 3, 2026

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Changwoo Baek, Jouwon Song, Sohyeon Kim, Kyeongbo Kong

PDF

Open Access 3 Reviews

TL;DR

This paper empirically analyzes attention and diversity-based visual token pruning in large vision-language models, revealing their strengths and limitations, and proposes adaptive strategies that improve performance and reduce hallucinations.

Contribution

It provides a comprehensive empirical analysis of pruning methods, introduces image-aware adjustments, and presents a simple adaptive pruning mechanism with improved results.

Findings

01

Diversity-oriented pruning preserves less feature diversity than intended.

02

Attention-based pruning is more effective on simple images.

03

Diversity-based methods handle complex images better.

Abstract

Large Vision-Language Models (LVLMs) have adopted visual token pruning strategies to mitigate substantial computational overhead incurred by extensive visual token sequences. While prior works primarily focus on either attention-based or diversity-based pruning methods, in-depth analysis of these approaches' characteristics and limitations remains largely unexplored. In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach. Our analysis reveals two insights: (1) Our erank-based quantitative analysis shows that many diversity-oriented pruning methods preserve substantially less feature diversity than intended; moreover, analysis using the CHAIR dataset reveals that the diversity they do retain is…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

1. Provides diverse experiments to validate its approach 2. The paper is well-written and easy to follow.

Weaknesses

1. I am not sure the findings of the paper is novel enough. [1] shows that "a satisfied pruning method should jointly take the token importance and diversity into account." to preserve both local (important) and global (diverse) information, which is what the paper proposes to do. 2. I think an important related work is missed [2]. It also determines pruning threshold based on input instance adaptively. 3. From my understanding, the number of visual tokens is fixed in the experiments. Why didn’t

Reviewer 02Rating 6Confidence 2

Strengths

Comprehensive empirical analysis and validation: - It goes beyond performance reporting and explores why each method behaves differently, grounded in measurable concepts like attention entropy and effective rank (erank). - Extensive experiments on nine multimodal benchmarks (VQAv2, GQA, TextVQA, ScienceQA, MMBench, etc.) and the CHAIR hallucination dataset demonstrate the robustness of the approach. Insightful findings with practical relevance： - The study reveals clear patterns: attention-base

Weaknesses

Limited novelty in algorithmic design: - The proposed adaptive pruning framework (AdaVTP) mainly combines two existing ideas — attention-based and diversity-based pruning — using an adaptive threshold determined by image complexity. - While insightful, this combination strategy is heuristic rather than fundamentally new in algorithmic form. Limited scope of model diversity: - Most experiments are based on a single LVLM backbone (LLaVA-1.5-7B). - The generalizability of the findings to other arc

Reviewer 03Rating 2Confidence 4

Strengths

1. The paper provides a thorough and systematic empirical comparison between attention- and diversity-based pruning strategies, which is less explored in depth before. 2. The adoption of effective rank and attention entropy as quantitative measures for image complexity is conceptually reasonable.

Weaknesses

1. The proposed adaptive thresholding strategy is relatively simple and heuristic (a logarithmic mapping between erank and threshold). It does not provide strong methodological or theoretical innovation beyond straightforward empirical observations. 2. The proposed adaptive thresholding strategy introduces several hyperparameters, notably the scaling coefficients and other implementation choices. These parameters may influence pruning behavior, yet the paper does not provide a clear justificati

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning