Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models
Chenxi Zhou, Pengfei Cao, Jiang Li, Bohan Yu, Jinyu Ye, Jun Zhao, and Kang Liu

TL;DR
This paper introduces task-stratified scaling laws for post-training quantized large language models, revealing how different knowledge capabilities are affected by quantization and guiding better quantization strategies.
Contribution
It develops a unified framework that models the impact of model size, bit-width, and fine-grained factors on various knowledge capabilities in PTQ, validated across diverse configurations.
Findings
Reasoning is most sensitive to quantization precision.
Application capabilities scale with model size and bit-width.
Memorization is sensitive to calibration set size.
Abstract
Post-Training Quantization (PTQ) is a critical strategy for efficient Large Language Models (LLMs) deployment. However, existing scaling laws primarily focus on general performance, overlooking crucial fine-grained factors and how quantization differentially impacts diverse knowledge capabilities. To address this, we establish Task-Stratified Knowledge Scaling Laws. By stratifying capabilities into memorization, application, and reasoning, we develop a framework that unifies model size, bit-width, and fine-grained factors: group size and calibration set size. Validated on 293 diverse PTQ configurations, our framework demonstrates strong fit and cross-architecture consistency. It reveals distinct sensitivities across knowledge capabilities: reasoning is precision-critical, application is scale-responsive, and memorization is calibration-sensitive. We highlight that in low-bit scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
