Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

Zichen Wen; Yifeng Gao; Weijia Li; Conghui He; Linfeng Zhang

arXiv:2502.11501·cs.CL·May 30, 2025

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

Zichen Wen, Yifeng Gao, Weijia Li, Conghui He, Linfeng Zhang

PDF

Open Access

TL;DR

This paper critically examines token pruning methods in multimodal large language models, questioning their effectiveness and evaluation protocols, and provides insights for designing better token pruning strategies.

Contribution

The paper analyzes existing token pruning approaches, identifies their limitations, and offers insights to guide future research in improving token pruning techniques.

Findings

01

Many existing approaches underperform compared to naive random selection

02

Attention-based scoring may not reliably identify redundant tokens

03

Current evaluation protocols may be biased or incomplete

Abstract

Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs. Recently, abundant works have been proposed to solve this problem with token pruning, which identifies the redundant tokens in MLLMs and then prunes them to reduce the computation and KV storage costs, leading to significant acceleration without training. While these methods claim efficiency gains, critical questions about their fundamental design and evaluation remain unanswered: Why do many existing approaches underperform even compared to naive random token selection? Are attention-based scoring sufficient for reliably identifying redundant tokens? Is language information really helpful during token pruning? What makes a good trade-off between token importance and duplication? Are current evaluation protocols…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsPruning