SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev,, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan, Alistarh

TL;DR
SpQR is a novel compression method for large language models that achieves near-lossless accuracy by isolating outliers and compressing the rest, enabling efficient deployment on consumer hardware with minimal accuracy loss.
Contribution
Introduces SpQR, a new sparse-quantized format that enables near-lossless compression of LLMs by isolating outliers and compressing remaining weights, improving deployment efficiency.
Findings
Achieves less than 1% perplexity loss on LLaMA and Falcon models.
Enables running 33B parameter models on a 24 GB GPU with 15% speedup.
Provides efficient encoding and decoding algorithms for SpQR.
Abstract
Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy losses, especially for smaller models in the 1-10B parameter range, which are well-suited for edge deployments. To address this accuracy issue, we introduce the Sparse-Quantized Representation (SpQR), a new compressed format and quantization technique which enables for the first time near-lossless compression of LLMs across model scales, while reaching similar compression levels to previous methods. SpQR works by identifying and isolating outlier weights, which cause particularly-large quantization errors, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Ferroelectric and Negative Capacitance Devices
