EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models
Yahong Wang, Juncheng Wu, Zhangkai Ni, Chengmei Yang, Yihang Liu, Longzhen Yang, Yuyin Zhou, Ying Wen, Lianghua He

TL;DR
EntropyPrune introduces a matrix-entropy-based token pruning method for multimodal large language models, identifying a key layer for pruning and achieving significant efficiency gains without sacrificing accuracy.
Contribution
The paper proposes a novel entropy-guided pruning framework that is more interpretable and transferable, utilizing spectral properties for efficient computation.
Findings
Achieves up to 68.2% FLOPs reduction with minimal performance loss
Outperforms existing pruning methods in accuracy and efficiency
Demonstrates robustness across high-resolution and video models
Abstract
Multimodal large language models (MLLMs) incur substantial inference cost due to the processing of hundreds of visual tokens per image. Although token pruning has proven effective for accelerating inference, determining when and where to prune remains largely heuristic. Existing approaches typically rely on static, empirically selected layers, which limit interpretability and transferability across models. In this work, we introduce a matrix-entropy perspective and identify an "Entropy Collapse Layer" (ECL), where the information content of visual representations exhibits a sharp and consistent drop, which provides a principled criterion for selecting the pruning stage. Building on this observation, we propose EntropyPrune, a novel matrix-entropy-guided token pruning framework that quantifies the information value of individual visual tokens and prunes redundant ones without relying on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
