HiPrune: Hierarchical Attention for Efficient Token Pruning in Vision-Language Models

Jizhihui Liu; Feiyi Du; Guangdao Zhu; Niu Lian; Jun Li; Bin Chen; Weili Guan; Yaowei Wang

arXiv:2508.00553·cs.CV·April 21, 2026

HiPrune: Hierarchical Attention for Efficient Token Pruning in Vision-Language Models

Jizhihui Liu, Feiyi Du, Guangdao Zhu, Niu Lian, Jun Li, Bin Chen, Weili Guan, Yaowei Wang

PDF

1 Repo

TL;DR

HiPrune introduces a hierarchical attention-based token pruning method for vision-language models that significantly reduces computation while maintaining high accuracy, leveraging intrinsic attention patterns in the encoder.

Contribution

The paper proposes a training-free, model-agnostic token pruning method utilizing hierarchical attention patterns, and introduces HiPrune++ for improved instruction following at low token budgets.

Findings

01

Achieves up to 99.3% task accuracy with only 1/3 tokens.

02

Reduces inference FLOPs by 58.7%.

03

Maintains up to 99.7% accuracy with 2/9 tokens, showing robustness.

Abstract

Vision-Language Models (VLMs) encode images and videos into abundant tokens, which contain substantial redundancy and computation cost. While visual token pruning mitigates the issue, most existing methods lack insight into the intrinsic property of the vision encoder itself. In this work, we dive into the vision encoder and prove that the middle layers pay more attention to the main objects of the image qualitatively and quantitatively, while the deep layers to tokens with rich global information. Utilizing this Hierarchical attention pattern, we propose HiPrune, a training-free and model-agnostic token Pruning method. HiPrune identifies three types of visual tokens according to their attention in different phases of the vision encoder, which preserves different levels of information. By coupling with the similarity of text tokens, we propose a prompt-aware variance, HiPrune++, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Danielement321/HiPrune
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.