Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
Yiran Huang, Lukas Thede, Massimiliano Mancini, Wenjia Xu, Zeynep Akata

TL;DR
This paper explores structural pruning and recovery training methods to compress multimodal large language models efficiently, achieving high performance retention with minimal data and computational resources.
Contribution
It introduces layerwise and widthwise pruning paradigms combined with recovery training techniques, demonstrating effective compression of MLLMs with limited data and resources.
Findings
Widthwise pruning performs better in low-resource scenarios.
Recovery training with only 5% data retains over 95% performance.
Finetuning the multimodal projector suffices at low compression levels.
Abstract
While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily involve training MLLMs from Small Language Models (SLMs), but these methods offer limited flexibility and remain computationally intensive. To address this gap, we propose to directly compress existing MLLMs through structural pruning combined with efficient recovery training. Specifically, we investigate two structural pruning paradigms--layerwise and widthwise pruning--applied to the language model backbone of MLLMs, alongside supervised finetuning and knowledge distillation. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise pruning generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
