Compresso: Structured Pruning with Collaborative Prompting Learns   Compact Large Language Models

Song Guo; Jiahang Xu; Li Lyna Zhang; Mao Yang

arXiv:2310.05015·cs.AI·October 12, 2023·1 cites

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang

PDF

Open Access 1 Repo

TL;DR

Compresso introduces a collaborative prompt-based structured pruning method that efficiently compresses large language models like LLaMA-7B, maintaining or improving performance while significantly reducing model size.

Contribution

It presents a novel pruning paradigm combining resource-efficient algorithms and collaborative prompting, enabling effective structured pruning during instruction tuning.

Findings

01

Pruned LLaMA-7B to 5.4B with maintained or improved performance.

02

Outperformed one-shot pruning baselines across multiple benchmarks.

03

Achieved up to 11.43% higher scores in key NLP tasks.

Abstract

Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/moonlit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsLLaMA · Pruning