TL;DR
YAQA is a novel adaptive quantization algorithm that directly minimizes end-to-end error, providing theoretical guarantees and outperforming existing methods like GPTQ and LDLQ in reducing quantization error.
Contribution
Introduces YAQA, the first end-to-end error bound-based adaptive rounding algorithm for quantization with theoretical analysis and superior empirical performance.
Findings
YAQA reduces quantization error by approximately 30% compared to GPTQ/LDLQ.
YAQA achieves lower error than quantization aware training.
Provides the first theoretical end-to-end error bounds for quantization algorithms.
Abstract
The goal of quantization is to produce a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, most quantization algorithms minimize the immediate activation error of each layer as a proxy for the end-to-end error. However, this ignores the effect of future layers, making it a poor proxy. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the error at the network's output. YAQA introduces a series of theoretical results that culminate in the first end-to-end error bounds for quantization algorithms. First, we characterize the convergence time of adaptive rounding algorithms via the structure of their Hessian approximations. We then show that the end-to-end error can be bounded by the approximation's cosine similarity to the true Hessian. This admits a…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper is easy to read, well structured and clearly written. The main claims in the paper are well supported by rigorous theoretical proofs and extensive empirical evaluation. - The paper provides novel contributions to a highly impactful area of model compression. - The theoretical framework is strong and justifies the adoption of Kronecker factored Hessian that enables fast and symmetric I/O feedback during adaptive rounding. - The paper provably shows superiority over LDLQ under the low
- Its not clear how much is the actual quantization time cost. The paper states that YAQA adds no inference overhead and has the same asymptotic complexity as LDLQ. It would be good to show actual wall clock time comparison against LDLQ on large model since YAQA requires power iteration on model hessian. - YAQA is provable shown to be better than LDLQ based on the assumption H_O is low rank. Could the authors show empirical evidence to support the assumption? - The evaluations are mainly focused
- The experiments seem to be extensive, as in Tables 1-4. - The theoretical analysis seems to be interesting and solid, but I am not good at math and theorem proof, and I am not capable of carefully checking the theorems and their correctness.
- In the introduction and background, the authors first introduced PTQ and QAT, and somewhere suddenly convert to adaptive rounding algorithm. I am not sure, but it seems adaptive rounding algorithm is something between these two methods (see lines 095-099 of the original paper). It might be better to claim here or somewhere that adaptive rounding algorithm is a new type of method similar to PTQ without training on data, but optimize the quantized weights from the original full-precision model,
1. YAQA provides the first formal end-to-end quantization error bounds, linking quantization quality directly to the cosine similarity between the true and approximated Hessians. 2. Built on a Kronecker-factored Hessian approximation, YAQA is compatible with various quantizers and maintains the same computational complexity as prior adaptive rounding methods 3. It introduces symmetric input–output feedback and structured Hessian forms, ensuring faster convergence and greater rounding stability
1. Is the Kronecker-factored Hessian approximation transferable across different model architectures? 2. Can the “structural nilpotence degree” quantitatively predict convergence speed in real applications? 3. Can the proposed method be applied to mixed-precision quantization? 4. The paper claims that the proposed method introduces no additional inference overhead — could the authors provide supporting evidence or justification for this claim? 5. The advantage of YAQA appears to mainly arise fro
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
