Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs

Yoonjun Cho; Dongjae Jeon; Soeun Kim; Moongyu Jeon; Albert No

arXiv:2602.02001·cs.LG·May 14, 2026

Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs

Yoonjun Cho, Dongjae Jeon, Soeun Kim, Moongyu Jeon, Albert No

PDF

1 Repo

TL;DR

This paper introduces Structured Residual Reconstruction (SRR), a novel rank-allocation framework for quantization error correction in large language models, improving accuracy and stability in post-training quantization and fine-tuning.

Contribution

SRR optimally allocates rank for error correction by preserving dominant subspaces, supported by a theory-guided criterion, and enhances quantized fine-tuning stability and performance.

Findings

01

Consistent perplexity reductions across diverse models and quantization settings.

02

A 5.9 percentage-point average gain on GLUE with 2-bit QPEFT.

03

The project page is available at https://ai-isl.github.io/srr.

Abstract

Quantization Error Reconstruction (QER) reduces accuracy loss in Post-Training Quantization (PTQ) by approximating weights as $W \approx Q + LR$ , using a rank- $r$ correction to reconstruct quantization error. Prior methods devote the full rank budget to error reconstruction, which is suboptimal when $W$ has intrinsic low-rank structure and quantization corrupts dominant directions. We propose Structured Residual Reconstruction (SRR), a rank-allocation framework that preserves the top- $k$ singular subspace of the activation-scaled weight before quantization, quantizes only the residual, and uses the remaining rank $r - k$ for error reconstruction. We derive a theory-guided criterion for selecting $k$ by balancing quantization-exposed energy and unrecoverable error under rank constraints. We further show that resulting $\mathbf{Q} +…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ai-isl.github.io/srr
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.