Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong, Yu

TL;DR
This paper uncovers how low-bit quantization impacts undertrained large language models differently than fully trained ones, providing scaling laws to predict quantization performance and implications for future models trained with over 100 trillion tokens.
Contribution
It introduces scaling laws relating quantization degradation to training tokens, size, and bit width, and proposes using QiD as a measure of training level for LLMs.
Findings
Low-bit quantization favors undertrained LLMs.
Scaling laws predict quantization performance for models with 100T tokens.
Future large models may not perform well under low-bit quantization.
Abstract
We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · Advanced Data Storage Technologies
