Scaling Laws for Post Training Quantized Large Language Models
Zifei Xu, Alexander Lan, Wanzin Yazar, Tristan Webb, Sayeh Sharify,, Xin Wang

TL;DR
This paper investigates the scaling behavior of large language models after post-training quantization, revealing predictable patterns and proposing a statistical model to forecast quantized model performance.
Contribution
It provides the first systematic empirical analysis of post-training quantization effects on LLMs and introduces a model to predict quantized model quality based on scaling factors.
Findings
Quantization performance can be predicted by a statistical model.
Key scaling factors relate to the local loss landscape characteristics.
Performance variability across models and quantization types is systematically characterized.
Abstract
Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size. In contrast to the existence of practical scaling laws governing pre-training, the quality of LLMs after post-training compression remains highly unpredictable, often requiring case-by-case validation in practice. In this work, we attempted to close this gap for post-training weight quantization of LLMs by conducting a systematic empirical study on multiple LLM families quantized to numerous low-precision tensor data types using popular weight quantization techniques. We identified key scaling factors pertaining to characteristics of the local loss landscape, based on which the performance of quantized LLMs can be reasonably well predicted by a statistical model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
