One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
Rongguang Ye, Ming Tang

TL;DR
This paper introduces UniCuCo, a universal model for efficient, customized pruning of large language models that significantly reduces processing time for multiple requests while maintaining accuracy.
Contribution
We propose UniCuCo, which uses a Gaussian process to enable fast, differentiable approximation of pruning strategies, allowing for efficient multi-request model compression.
Findings
28 times faster than baselines in processing 64 requests
Maintains comparable accuracy to existing methods
Effective approximation of non-differentiable pruning with Gaussian process
Abstract
Existing pruning methods for large language models (LLMs) focus on achieving high compression rates while maintaining model performance. Although these methods have demonstrated satisfactory performance in handling a single user's compression request, their processing time increases linearly with the number of requests, making them inefficient for real-world scenarios with multiple simultaneous requests. To address this limitation, we propose a Univeral Model for Customized Compression (UniCuCo) for LLMs, which introduces a StratNet that learns to map arbitrary requests to their optimal pruning strategy. The challenge in training StratNet lies in the high computational cost of evaluating pruning strategies and the non-differentiable nature of the pruning process, which hinders gradient backpropagation for StratNet updates. To overcome these challenges, we leverage a Gaussian process to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Topic Modeling · Natural Language Processing Techniques
MethodsFocus · Gaussian Process · Pruning
