First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

Xingyu Zheng; Haotong Qin; Yuye Li; Haoran Chu; Jiakai Wang; Jinyang Guo; Michele Magno; Xianglong Liu

arXiv:2507.11017·cs.LG·November 17, 2025

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

Xingyu Zheng, Haotong Qin, Yuye Li, Haoran Chu, Jiakai Wang, Jinyang Guo, Michele Magno, Xianglong Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces FOEM, a novel post-training quantization method for large language models that explicitly incorporates first-order gradient terms, significantly improving quantization accuracy and outperforming existing methods.

Contribution

FOEM is the first quantization approach to explicitly include first-order gradient terms, reducing computational costs while enhancing model performance across various benchmarks.

Findings

01

FOEM reduces perplexity of Llama3-8B by 17.3% in 3-bit quantization.

02

FOEM improves 5-shot MMLU accuracy from 53.8% to 56.1%.

03

FOEM outperforms classical GPTQ and combines well with SpinQuant.

Abstract

Post-training quantization (PTQ) offers an efficient approach to compressing large language models (LLMs), significantly reducing memory access and computational costs. Existing compensation-based weight calibration methods often rely on a second-order Taylor expansion to model quantization error, under the assumption that the first-order term is negligible in well-trained full-precision models. However, we reveal that the progressive compensation process introduces accumulated first-order deviations between latent weights and their full-precision counterparts, making this assumption fundamentally flawed. To address this, we propose FOEM, a novel PTQ method that explicitly incorporates first-order gradient terms to improve quantization error compensation. FOEM approximates gradients by performing a first-order Taylor expansion around the pre-quantization weights. This yields an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models· underline

Taxonomy

TopicsTopic Modeling