Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

Yiran Huang; Lukas Thede; Massimiliano Mancini; Wenjia Xu; Zeynep Akata

arXiv:2604.24380·cs.CL·April 28, 2026

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

Yiran Huang, Lukas Thede, Massimiliano Mancini, Wenjia Xu, Zeynep Akata

PDF

1 Repo

TL;DR

This paper explores structured pruning methods for large vision-language models, demonstrating effective compression with minimal performance loss using limited data and lightweight recovery techniques.

Contribution

It introduces layerwise and widthwise pruning paradigms combined with finetuning and distillation, providing practical strategies for efficient LVLM compression.

Findings

01

Widthwise pruning outperforms in low-resource scenarios.

02

Finetuning only the multimodal projector suffices at small compression levels.

03

Effective recovery achieved with just 5% of original data.

Abstract

While Large Vision Language Models (LVLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose deployment challenges on resource-constrained edge devices. Current parameter reduction techniques primarily involve training LVLMs from small language models, but these methods offer limited flexibility and remain computationally intensive. We study a complementary route: compressing existing LVLMs by applying structured pruning to the language model backbone, followed by lightweight recovery training. Specifically, we investigate two structural pruning paradigms: layerwise and widthwise pruning, and pair them with supervised finetuning and knowledge distillation on logits and hidden states. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YiranHuangIrene/VLMCompression.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.