Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang

TL;DR
Compresso introduces a collaborative prompt-based structured pruning method that efficiently compresses large language models like LLaMA-7B, maintaining or improving performance while significantly reducing model size.
Contribution
It presents a novel pruning paradigm combining resource-efficient algorithms and collaborative prompting, enabling effective structured pruning during instruction tuning.
Findings
Pruned LLaMA-7B to 5.4B with maintained or improved performance.
Outperformed one-shot pruning baselines across multiple benchmarks.
Achieved up to 11.43% higher scores in key NLP tasks.
Abstract
Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLLaMA · Pruning
