Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Haoyu Zhang, Shihao Zhang, Ian Colbert, Rayan Saab

TL;DR
This paper provides the first rigorous theoretical error bounds for post-training quantization algorithms OPTQ and Qronos, explaining their empirical success and guiding practical parameter choices.
Contribution
It introduces quantitative error bounds for OPTQ and Qronos, validating their design choices and improving understanding of their performance in neural network quantization.
Findings
Derived non-asymptotic 2-norm error bounds for OPTQ.
Established stronger infinity-norm error bounds for stochastic variants.
Provided theoretical explanations for empirical advantages of Qronos.
Abstract
Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs of modern deep neural networks, including large language models (LLMs). Among PTQ algorithms, the OPTQ framework-also known as GPTQ-has emerged as a leading method due to its computational efficiency and strong empirical performance. Despite its widespread adoption, however, OPTQ lacks rigorous quantitative theoretical guarantees. This paper presents the first quantitative error bounds for both deterministic and stochastic variants of OPTQ, as well as for Qronos, a recent related state-of-the-art PTQ algorithm. We analyze how OPTQ's iterative procedure induces quantization error and derive non-asymptotic 2-norm error bounds that depend explicitly on the calibration data and a regularization parameter that OPTQ uses. Our analysis provides theoretical justification for several practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
