Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Yiran Huang; Lukas Thede; Massimiliano Mancini; Wenjia Xu; Zeynep Akata

arXiv:2507.20749·cs.CL·July 29, 2025

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

Yiran Huang, Lukas Thede, Massimiliano Mancini, Wenjia Xu, Zeynep Akata

PDF

Open Access

TL;DR

This paper explores structural pruning and recovery training methods to compress multimodal large language models efficiently, achieving high performance retention with minimal data and computational resources.

Contribution

It introduces layerwise and widthwise pruning paradigms combined with recovery training techniques, demonstrating effective compression of MLLMs with limited data and resources.

Findings

01

Widthwise pruning performs better in low-resource scenarios.

02

Recovery training with only 5% data retains over 95% performance.

03

Finetuning the multimodal projector suffices at low compression levels.

Abstract

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily involve training MLLMs from Small Language Models (SLMs), but these methods offer limited flexibility and remain computationally intensive. To address this gap, we propose to directly compress existing MLLMs through structural pruning combined with efficient recovery training. Specifically, we investigate two structural pruning paradigms--layerwise and widthwise pruning--applied to the language model backbone of MLLMs, alongside supervised finetuning and knowledge distillation. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise pruning generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods