IPCV: Information-Preserving Compression for MLLM Visual Encoders

Yuan Chen; Zichen Wen; Yuzhou Wu; Xuyang Liu; Shuang Chen; Junpeng Ma; Weijia Li; Conghui He; Linfeng Zhang

arXiv:2512.18747·cs.CV·December 23, 2025

IPCV: Information-Preserving Compression for MLLM Visual Encoders

Yuan Chen, Zichen Wen, Yuzhou Wu, Xuyang Liu, Shuang Chen, Junpeng Ma, Weijia Li, Conghui He, Linfeng Zhang

PDF

Open Access

TL;DR

IPCV is a training-free, information-preserving token compression framework for multimodal large language model visual encoders that reduces computation while maintaining performance by selectively pruning tokens with minimal information loss.

Contribution

IPCV introduces a novel, training-free token pruning method with neighbor-guided reconstruction and attention stabilization to improve efficiency of MLLM visual encoders.

Findings

01

Significantly reduces computational cost in MLLM visual encoders.

02

Outperforms existing training-free token compression methods.

03

Effective across diverse image and video benchmarks.

Abstract

Multimodal Large Language Models (MLLMs) deliver strong vision-language performance but at high computational cost, driven by numerous visual tokens processed by the Vision Transformer (ViT) encoder. Existing token pruning strategies are inadequate: LLM-stage token pruning overlooks the ViT's overhead, while conventional ViT token pruning, without language guidance, risks discarding textually critical visual cues and introduces feature distortions amplified by the ViT's bidirectional attention. To meet these challenges, we propose IPCV, a training-free, information-preserving compression framework for MLLM visual encoders. IPCV enables aggressive token pruning inside the ViT via Neighbor-Guided Reconstruction (NGR) that temporarily reconstructs pruned tokens to participate in attention with minimal overhead, then fully restores them before passing to the LLM. Besides, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Multimodal Machine Learning Applications