One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models

Rongguang Ye; Ming Tang

arXiv:2505.12216·cs.CL·May 27, 2025

One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models

Rongguang Ye, Ming Tang

PDF

Open Access

TL;DR

This paper introduces UniCuCo, a universal model for efficient, customized pruning of large language models that significantly reduces processing time for multiple requests while maintaining accuracy.

Contribution

We propose UniCuCo, which uses a Gaussian process to enable fast, differentiable approximation of pruning strategies, allowing for efficient multi-request model compression.

Findings

01

28 times faster than baselines in processing 64 requests

02

Maintains comparable accuracy to existing methods

03

Effective approximation of non-differentiable pruning with Gaussian process

Abstract

Existing pruning methods for large language models (LLMs) focus on achieving high compression rates while maintaining model performance. Although these methods have demonstrated satisfactory performance in handling a single user's compression request, their processing time increases linearly with the number of requests, making them inefficient for real-world scenarios with multiple simultaneous requests. To address this limitation, we propose a Univeral Model for Customized Compression (UniCuCo) for LLMs, which introduces a StratNet that learns to map arbitrary requests to their optimal pruning strategy. The challenge in training StratNet lies in the high computational cost of evaluating pruning strategies and the non-differentiable nature of the pruning process, which hinders gradient backpropagation for StratNet updates. To overcome these challenges, we leverage a Gaussian process to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Topic Modeling · Natural Language Processing Techniques

MethodsFocus · Gaussian Process · Pruning