Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Shihao Zhang, Haoyu Zhang, Ian Colbert, Rayan Saab

TL;DR
Qronos is a novel post-training quantization algorithm that iteratively corrects errors in neural network weights and activations, leading to improved model performance in language generation tasks.
Contribution
Qronos introduces a new iterative, optimization-based approach for post-training quantization that explicitly corrects quantization errors and surpasses existing data-driven methods.
Findings
Qronos outperforms previous state-of-the-art quantization methods on Llama3 models.
Qronos effectively corrects errors in weights, activations, and KV caches.
The algorithm is compatible with various transformation techniques.
Abstract
We introduce Qronos -- a new state-of-the-art post-training quantization algorithm that sequentially rounds and updates neural network weights. Qronos not only explicitly corrects errors due to both weight and activation quantization, but also errors resulting from quantizing previous layers. Our iterative algorithm is based on an interpretable and disciplined optimization framework that subsumes and surpasses existing data-driven approaches. At each step, Qronos alternates between error correction and diffusion via optimal update rules. Importantly, we prove that Qronos admits an efficient implementation that uses the Cholesky decomposition for solving least-squares problems. We also demonstrate that Qronos is compatible with existing transformation techniques such as Hadamard-based incoherence processing and weight-activation scaling equalization, among others. We evaluate Qronos…
Peer Reviews
Decision·ICLR 2026 Poster
1. Introduces a clear mismatch-aware formulation addressing activation drift across layers. 2. Provides an efficient implementation for the proposed algorithm. 3. Shows consistent empirical gains over prior PTQ methods across various models and benchmarks. 4. Can be easily used with various quantization algorithms and improve their performance. 5. Offers useful theoretical insights linking and generalizing existing algorithms like OPTQ.
1. The proposed method mainly extends OPTQ rather than introducing a fundamentally new algorithmic framework. 2. The improvement on weight–activation quantization appears smaller than that on weight-only quantization; additional analysis would help clarify the underlying reason. 3. The main paper reports results primarily for 3- and 4-bit settings, where existing methods already perform well. Including 2-bit quantization experiments would better demonstrate the robustness of the proposed approac
1). The paper is reasonably well written. 2). The intuition and analysis are solid, helping to clarify the motivation behind the proposed techniques as well as offering insights into the strengths and weaknesses of the widely used GPTQ method. 3). The experimental evaluation is comprehensive, particularly in combination with state-of-the-art techniques such as rotation and MagR. 4). The results generally show improved accuracy compared to previous methods.
1). The overall contribution is incremental. 2). The method introduces additional computational overhead to achieve accuracy gains over GPTQ. Since GPTQ also uses an approximate inverse Hessian to balance accuracy and speed, it is difficult to conclude that this method is definitively better than GPTQ. 3). The improvements are limited for 8B models, and no results are reported for very large models. 4). Some claims—particularly regarding implementation speedup and memory savings—seem misleadi
+ I have found the idea of alternating between error correction and diffusion novel and sensible. + The reported experimental results are promising and seem to be among the current SOTA. + The appendix contains detailed proofs for the theoretical results reported in Sec. 3.
- The literary presentation is still lacking. For example, "GPTAQ has been observed to be unstable in other reproductions." reads weird and the paper's introduction section needs to be revised significantly (too much emphasis on experimental results and little space for explaining the motivation or significance). - Lemma 3.2 seems like a known result. I understand you provided rigorous proof for this lemma. But I believe that the connection between LS and Cholesky decomposition has been known in
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
