LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling
Zhihan Zhang, Xiang Pan, Hongchen Wei, Zhenzhong Chen

TL;DR
LOP introduces an efficient neural pruning framework that learns optimal strategies directly from constraints, significantly reducing computational overhead and outperforming existing methods in deploying multimodal large language models.
Contribution
The paper presents a novel pruning approach that trains neural networks to predict pruning strategies without iterative search, enabling fast and adaptive model compression.
Findings
LOP achieves up to 1000x speedup over traditional methods.
LOP outperforms state-of-the-art pruning techniques in multiple tasks.
The method effectively adapts to various pruning constraints.
Abstract
Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal strategies through iterative search processes, resulting in substantial computational overhead for on-demand MLLMs adaptation. To address this challenge, we propose LOP, an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint, eliminating the need for computationally expensive search-based methods. LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint, eliminating the time-consuming iterative searches. Experimental results across multiple tasks show that LOP outperforms state-of-the-art pruning methods in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
