MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

Le Su; Xing Luo; Zhi Jin

arXiv:2605.17997·cs.LG·May 19, 2026

MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization

Le Su, Xing Luo, Zhi Jin

PDF

TL;DR

This paper introduces MARR, a module-adaptive residual reconstruction method for low-bit post-training quantization, which dynamically balances residual correction and bias to improve model performance.

Contribution

The paper proposes a novel module-specific residual scaling approach with an adaptive PID strategy to enhance quantization accuracy across different modules.

Findings

01

Achieves up to 20.2% performance gains on LLMs.

02

Achieves up to 4.6% relative gains on ViTs.

03

Demonstrates effectiveness under 4-bit quantization.

Abstract

Recently, residual reconstruction-based model quantization methods have achieved promising performance in low-bit post-training quantization (PTQ) by introducing cross-layer residuals to reduce error accumulated from previous layers.However, these residuals may also introduce additional bias arising from the Hessian-approximation (HA) assumption underlying reconstruction-based PTQ, leading to suboptimal quantization performance.In this work, we analyze that multiplying the residual term by a scaling coefficient provides a direct way to mitigate the HA bias associated with residual strength, while preserving accumulated-error correction. More importantly, we observe that this trade-off is module-dependent, making a single global residual strength insufficient to balance effective correction and residual-related bias across modules.Based on these observations, we propose Module-Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.