Rethinking Residual Errors in Compensation-based LLM Quantization

Shuaiting Li; Juncan Deng; Kedong Xu; Rongtao Deng; Hong Gu; Minghan Jiang; Haibin Shen; Kejie Huang

arXiv:2604.07955·cs.LG·April 10, 2026

Rethinking Residual Errors in Compensation-based LLM Quantization

Shuaiting Li, Juncan Deng, Kedong Xu, Rongtao Deng, Hong Gu, Minghan Jiang, Haibin Shen, Kejie Huang

PDF

1 Repo 1 Video

TL;DR

This paper improves residual error formulation in compensation-based LLM quantization, leading to better performance by aligning quantized outputs with full-precision models and incorporating compensation-aware error.

Contribution

It redefines the residual error objective and introduces compensation-aware error, enhancing existing methods like GPTQ and GPTAQ for LLM quantization.

Findings

01

Significant performance improvements on various LLMs and quantization settings.

02

Redefining the residual error objective improves alignment with full-precision outputs.

03

Incorporating compensation-aware error enhances quantization accuracy.

Abstract

Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative work, GPTQ, introduces several key techniques that make such iterative methods practical for LLMs with billions of parameters. GPTAQ extends this approach by introducing an asymmetric calibration process that aligns the output of each quantized layer with its full-precision counterpart, incorporating a residual error into the weight compensation framework. In this work, we revisit the formulation of the residual error. We identify a sub-optimal calibration objective in existing methods: during the intra-layer calibration process, they align the quantized output with the output from compensated weights, rather than the true output from the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

list0830/ResComp
github

Videos

Rethinking Residual Errors in Compensation-based LLM Quantization· slideslive