SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar, Dan Alistarh

TL;DR
This paper introduces SparseGPT, a novel pruning method that enables one-shot pruning of large GPT models to at least 50% sparsity with minimal accuracy loss, without retraining, and efficiently handles models like OPT-175B and BLOOM-176B.
Contribution
SparseGPT is the first pruning method capable of achieving high sparsity in large GPT models in one shot without retraining, significantly reducing inference costs.
Findings
Pruned models maintain accuracy with over 50% sparsity.
SparseGPT efficiently prunes models like OPT-175B and BLOOM-176B in under 4.5 hours.
Achieves 60% sparsity with negligible perplexity increase.
Abstract
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Thireus/Vicuna13B-v1.1-8bit-128gmodel· 8 dl· ♡ 168 dl♡ 16
- 🤗RedHatAI/mpt-7b-gsm8k-pruned50-quant-dsmodel· 11 dl11 dl
- 🤗RedHatAI/mpt-7b-gsm8k-pruned40-quant-dsmodel· 8 dl8 dl
- 🤗RedHatAI/mpt-7b-gsm8k-pruned60-quant-dsmodel· 8 dl8 dl
- 🤗RedHatAI/mpt-7b-gsm8k-pruned70-quant-dsmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗RedHatAI/mpt-7b-gsm8k-pruned80-quant-dsmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗RedHatAI/mpt-7b-gsm8k-pruned75-quant-dsmodel· 9 dl9 dl
- 🤗RedHatAI/mpt-7b-gsm8k-pruned60-ptmodel· 6 dl6 dl
- 🤗RedHatAI/zephyr-7b-beta-pruned50-quant-dsmodel· 5 dl5 dl
- 🤗RedHatAI/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-dsmodel· 9 dl9 dl
Videos
Taxonomy
TopicsTopic Modeling · Ferroelectric and Negative Capacitance Devices · Advanced Data Storage Technologies
MethodsPruning
