TL;DR
This paper introduces Structured Residual Reconstruction (SRR), a novel rank-allocation framework for quantization error correction in large language models, improving accuracy and stability in post-training quantization and fine-tuning.
Contribution
SRR optimally allocates rank for error correction by preserving dominant subspaces, supported by a theory-guided criterion, and enhances quantized fine-tuning stability and performance.
Findings
Consistent perplexity reductions across diverse models and quantization settings.
A 5.9 percentage-point average gain on GLUE with 2-bit QPEFT.
The project page is available at https://ai-isl.github.io/srr.
Abstract
Quantization Error Reconstruction (QER) reduces accuracy loss in Post-Training Quantization (PTQ) by approximating weights as , using a rank- correction to reconstruct quantization error. Prior methods devote the full rank budget to error reconstruction, which is suboptimal when has intrinsic low-rank structure and quantization corrupts dominant directions. We propose Structured Residual Reconstruction (SRR), a rank-allocation framework that preserves the top- singular subspace of the activation-scaled weight before quantization, quantizes only the residual, and uses the remaining rank for error reconstruction. We derive a theory-guided criterion for selecting by balancing quantization-exposed energy and unrecoverable error under rank constraints. We further show that resulting $\mathbf{Q} +…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
