Compression Scaling Laws:Unifying Sparsity and Quantization
Elias Frantar, Utku Evci, Wonpyo Park, Neil Houlsby, Dan Alistarh

TL;DR
This paper unifies the understanding of various compression techniques like sparsity and quantization in large language models through a common scaling law framework, revealing their effects on model efficiency during pretraining.
Contribution
It extends previous scaling law work to include quantization, showing how different compression methods can be compared and combined within a unified theoretical framework.
Findings
Weight sparsity acts as a constant multiplier on model size.
Weight-only quantization provides strong parameter efficiency.
Full quantization shows diminishing returns at lower bitwidths.
Abstract
We investigate how different compression techniques -- such as weight and activation quantization, and weight sparsity -- affect the scaling behavior of large language models (LLMs) during pretraining. Building on previous work showing that weight sparsity acts as a constant multiplier on model size in scaling laws, we demonstrate that this "effective parameter" scaling pattern extends to quantization as well. Specifically, we establish that weight-only quantization achieves strong parameter efficiency multipliers, while full quantization of both weights and activations shows diminishing returns at lower bitwidths. Our results suggest that different compression techniques can be unified under a common scaling law framework, enabling principled comparison and combination of these methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
