MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

Jiameng Li; Aleksei Tiulpin; Matthew B. Blaschko

arXiv:2604.03072·cs.CV·April 6, 2026

MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

Jiameng Li, Aleksei Tiulpin, Matthew B. Blaschko

PDF

TL;DR

This paper introduces MI-Pruner, a novel crossmodal mutual information-based token pruning method for multimodal large language models, improving efficiency without architectural changes.

Contribution

It proposes a direct mutual information measurement approach for visual token pruning, outperforming attention-based methods in efficiency and effectiveness.

Findings

01

Outperforms previous attention-based pruning methods

02

Requires no internal attention maps or architectural modifications

03

Demonstrates minimal latency increase

Abstract

For multimodal large language models (MLLMs), visual information is relatively sparse compared with text. As a result, research on visual pruning emerges for efficient inference. Current approaches typically measure token importance based on the attention scores in the visual encoder or in the LLM decoder, then select visual tokens with high attention scores while pruning others. In this paper, we pursue a different and more surgical approach. Instead of relying on mechanism-specific signals, we directly compute Mutual Information (MI) between visual and textual features themselves, prior to their interaction. This allows us to explicitly measure crossmodal dependency at the feature levels. Our MI-Pruner is simple, efficient and non-intrusive, requiring no access to internal attention maps or architectural modifications. Experimental results demonstrate that our approach outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.