Model-Preserving Adaptive Rounding

Albert Tseng; Zhaofeng Sun; Christopher De Sa

arXiv:2505.22988·cs.LG·September 29, 2025

Model-Preserving Adaptive Rounding

Albert Tseng, Zhaofeng Sun, Christopher De Sa

PDF

2 Repos 1 Models 3 Reviews

TL;DR

YAQA is a novel adaptive quantization algorithm that directly minimizes end-to-end error, providing theoretical guarantees and outperforming existing methods like GPTQ and LDLQ in reducing quantization error.

Contribution

Introduces YAQA, the first end-to-end error bound-based adaptive rounding algorithm for quantization with theoretical analysis and superior empirical performance.

Findings

01

YAQA reduces quantization error by approximately 30% compared to GPTQ/LDLQ.

02

YAQA achieves lower error than quantization aware training.

03

Provides the first theoretical end-to-end error bounds for quantization algorithms.

Abstract

The goal of quantization is to produce a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, most quantization algorithms minimize the immediate activation error of each layer as a proxy for the end-to-end error. However, this ignores the effect of future layers, making it a poor proxy. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the error at the network's output. YAQA introduces a series of theoretical results that culminate in the first end-to-end error bounds for quantization algorithms. First, we characterize the convergence time of adaptive rounding algorithms via the structure of their Hessian approximations. We then show that the end-to-end error can be bounded by the approximation's cosine similarity to the true Hessian. This admits a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

- The paper is easy to read, well structured and clearly written. The main claims in the paper are well supported by rigorous theoretical proofs and extensive empirical evaluation. - The paper provides novel contributions to a highly impactful area of model compression. - The theoretical framework is strong and justifies the adoption of Kronecker factored Hessian that enables fast and symmetric I/O feedback during adaptive rounding. - The paper provably shows superiority over LDLQ under the low

Weaknesses

- Its not clear how much is the actual quantization time cost. The paper states that YAQA adds no inference overhead and has the same asymptotic complexity as LDLQ. It would be good to show actual wall clock time comparison against LDLQ on large model since YAQA requires power iteration on model hessian. - YAQA is provable shown to be better than LDLQ based on the assumption H_O is low rank. Could the authors show empirical evidence to support the assumption? - The evaluations are mainly focused

Reviewer 02Rating 4Confidence 2

Strengths

- The experiments seem to be extensive, as in Tables 1-4. - The theoretical analysis seems to be interesting and solid, but I am not good at math and theorem proof, and I am not capable of carefully checking the theorems and their correctness.

Weaknesses

- In the introduction and background, the authors first introduced PTQ and QAT, and somewhere suddenly convert to adaptive rounding algorithm. I am not sure, but it seems adaptive rounding algorithm is something between these two methods (see lines 095-099 of the original paper). It might be better to claim here or somewhere that adaptive rounding algorithm is a new type of method similar to PTQ without training on data, but optimize the quantized weights from the original full-precision model,

Reviewer 03Rating 4Confidence 2

Strengths

1. YAQA provides the first formal end-to-end quantization error bounds, linking quantization quality directly to the cosine similarity between the true and approximated Hessians. 2. Built on a Kronecker-factored Hessian approximation, YAQA is compatible with various quantizers and maintains the same computational complexity as prior adaptive rounding methods 3. It introduces symmetric input–output feedback and structured Hessian forms, ensuring faster convergence and greater rounding stability

Weaknesses

1. Is the Kronecker-factored Hessian approximation transferable across different model architectures? 2. Can the “structural nilpotence degree” quantitatively predict convergence speed in real applications? 3. Can the proposed method be applied to mixed-precision quantization? 4. The paper claims that the proposed method introduces no additional inference overhead — could the authors provide supporting evidence or justification for this claim? 5. The advantage of YAQA appears to mainly arise fro

Code & Models

Repositories

Models

🤗
Sayankotor/FastKronQuantization
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.