Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning

Zhaoqi Xu; Yingying Zhang; Jian Li; Jianwei Guo; Qiannan Zhu; Hua Huang

arXiv:2511.19518·cs.CV·November 26, 2025

Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning

Zhaoqi Xu, Yingying Zhang, Jian Li, Jianwei Guo, Qiannan Zhu, Hua Huang

PDF

Open Access

TL;DR

This paper introduces InfoPrune, an information-theoretic framework for adaptively compressing vision-language models by pruning attention heads and low-rank approximations, achieving significant efficiency gains with minimal performance loss.

Contribution

The work presents a novel, theoretically grounded method for VLM compression using the Information Bottleneck principle, entropy-based metrics, and adaptive pruning schemes.

Findings

01

Achieves up to 3.2x FLOP reduction and 1.8x acceleration.

02

Maintains performance with negligible degradation on VQAv2, TextVQA, and GQA.

03

Provides a unified, information-theoretic criterion for structural sparsity and efficiency.

Abstract

Recent advances in vision-language models (VLMs) have shown remarkable performance across multimodal tasks, yet their ever-growing scale poses severe challenges for deployment and efficiency. Existing compression methods often rely on heuristic importance metrics or empirical pruning rules, lacking theoretical guarantees about information preservation. In this work, we propose InfoPrune, an information-theoretic framework for adaptive structural compression of VLMs. Grounded in the Information Bottleneck principle, we formulate pruning as a trade-off between retaining task-relevant semantics and discarding redundant dependencies. To quantify the contribution of each attention head, we introduce an entropy-based effective rank (eRank) and employ the Kolmogorov--Smirnov (KS) distance to measure the divergence between original and compressed structures. This yields a unified criterion that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis