Understanding the Difficulty of Low-Precision Post-Training Quantization   for LLMs

Zifei Xu; Sayeh Sharify; Wanzin Yazar; Tristan Webb; Xin Wang

arXiv:2410.14570·cs.LG·April 21, 2025

Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Zifei Xu, Sayeh Sharify, Wanzin Yazar, Tristan Webb, Xin Wang

PDF

Open Access

TL;DR

This paper investigates the challenges of low-precision post-training quantization for large language models, revealing that local error minimization often underperforms compared to quantization-aware fine-tuning, especially at very low precision levels.

Contribution

It demonstrates the misalignment between local and global objectives in post-training quantization and emphasizes the importance of fine-tuning for effective low-precision compression.

Findings

01

Post-training quantization underperforms compared to fine-tuning at low precision.

02

Misalignment between local and global objectives causes quantization difficulty.

03

Local error minimization has limited utility for very low-precision LLM compression.

Abstract

Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization by minimizing local, layer-wise quantization errors, or through quantization-aware fine-tuning by minimizing the global loss function. In this study, we discovered that, under the same data constraint, the former approach nearly always fared worse than the latter, a phenomenon particularly prominent when the numerical precision is very low. We further showed that this difficulty of post-training quantization arose from stark misalignment between optimization of the local and global objective functions. Our findings explains limited utility in minimization of local quantization error and the importance of direct quantization-aware fine-tuning, in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis