HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models

Qihui Zhu; Tao Zhang; Yuchen Wang; Zijian Wen; Mengjie Zhang; Shuangwu Chen; Xiaobin Tan; Jian Yang; Yang Liu; Zhenhua Dong; Xianzhi Yu; Yinfei Pan

arXiv:2604.07812·cs.CV·April 10, 2026

HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models

Qihui Zhu, Tao Zhang, Yuchen Wang, Zijian Wen, Mengjie Zhang, Shuangwu Chen, Xiaobin Tan, Jian Yang, Yang Liu, Zhenhua Dong, Xianzhi Yu, Yinfei Pan

PDF

1 Repo

TL;DR

HAWK is a training-free, importance-aware visual token pruning method for multimodal large language models that retains accuracy while significantly reducing inference time and resource usage.

Contribution

It introduces a novel head importance-aware approach that leverages attention head importance and text-guided attention to effectively prune visual tokens in MLLMs.

Findings

01

Retains 96.0% accuracy after pruning 80.2% of visual tokens.

02

Reduces end-to-end latency to 74.4% of original.

03

Decreases GPU memory usage across models.

Abstract

In multimodal large language models (MLLMs), the surge of visual tokens significantly increases the inference time and computational overhead, making them impractical for real-time or resource-constrained applications. Visual token pruning is a promising strategy for reducing the cost of MLLM inference by removing redundant visual tokens. Existing research usually assumes that all attention heads contribute equally to the visual interpretation. However, our study reveals that different heads may capture distinct visual semantics and inherently play distinct roles in visual processing. In light of this observation, we propose HAWK, a head importance-aware visual token pruning method that perceives the varying importance of attention heads in visual tasks to maximize the retention of crucial tokens. By leveraging head importance weights and text-guided attention to assess visual token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peppery77/HAWK.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.