EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models

Yahong Wang; Juncheng Wu; Zhangkai Ni; Chengmei Yang; Yihang Liu; Longzhen Yang; Yuyin Zhou; Ying Wen; Lianghua He

arXiv:2602.17196·cs.CV·February 20, 2026

EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models

Yahong Wang, Juncheng Wu, Zhangkai Ni, Chengmei Yang, Yihang Liu, Longzhen Yang, Yuyin Zhou, Ying Wen, Lianghua He

PDF

Open Access

TL;DR

EntropyPrune introduces a matrix-entropy-based token pruning method for multimodal large language models, identifying a key layer for pruning and achieving significant efficiency gains without sacrificing accuracy.

Contribution

The paper proposes a novel entropy-guided pruning framework that is more interpretable and transferable, utilizing spectral properties for efficient computation.

Findings

01

Achieves up to 68.2% FLOPs reduction with minimal performance loss

02

Outperforms existing pruning methods in accuracy and efficiency

03

Demonstrates robustness across high-resolution and video models

Abstract

Multimodal large language models (MLLMs) incur substantial inference cost due to the processing of hundreds of visual tokens per image. Although token pruning has proven effective for accelerating inference, determining when and where to prune remains largely heuristic. Existing approaches typically rely on static, empirically selected layers, which limit interpretability and transferability across models. In this work, we introduce a matrix-entropy perspective and identify an "Entropy Collapse Layer" (ECL), where the information content of visual representations exhibits a sharp and consistent drop, which provides a principled criterion for selecting the pruning stage. Building on this observation, we propose EntropyPrune, a novel matrix-entropy-guided token pruning framework that quantifies the information value of individual visual tokens and prunes redundant ones without relying on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning