Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models

Chenxi Zhou; Pengfei Cao; Jiang Li; Bohan Yu; Jinyu Ye; Jun Zhao; and Kang Liu

arXiv:2508.18609·cs.CL·April 23, 2026

Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models

Chenxi Zhou, Pengfei Cao, Jiang Li, Bohan Yu, Jinyu Ye, Jun Zhao, and Kang Liu

PDF

TL;DR

This paper introduces task-stratified scaling laws for post-training quantized large language models, revealing how different knowledge capabilities are affected by quantization and guiding better quantization strategies.

Contribution

It develops a unified framework that models the impact of model size, bit-width, and fine-grained factors on various knowledge capabilities in PTQ, validated across diverse configurations.

Findings

01

Reasoning is most sensitive to quantization precision.

02

Application capabilities scale with model size and bit-width.

03

Memorization is sensitive to calibration set size.

Abstract

Post-Training Quantization (PTQ) is a critical strategy for efficient Large Language Models (LLMs) deployment. However, existing scaling laws primarily focus on general performance, overlooking crucial fine-grained factors and how quantization differentially impacts diverse knowledge capabilities. To address this, we establish Task-Stratified Knowledge Scaling Laws. By stratifying capabilities into memorization, application, and reasoning, we develop a framework that unifies model size, bit-width, and fine-grained factors: group size and calibration set size. Validated on 293 diverse PTQ configurations, our framework demonstrates strong fit and cross-architecture consistency. It reveals distinct sensitivities across knowledge capabilities: reasoning is precision-critical, application is scale-responsive, and memorization is calibration-sensitive. We highlight that in low-bit scenarios,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.