A Comprehensive Evaluation on Quantization Techniques for Large Language Models

Yutong Liu; Cairong Zhao; Guosheng Hu

arXiv:2507.17417·cs.LG·January 30, 2026

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

Yutong Liu, Cairong Zhao, Guosheng Hu

PDF

Open Access

TL;DR

This paper provides a comprehensive, fair comparison of recent quantization techniques for large language models, analyzing their components, settings, and data formats to guide future improvements.

Contribution

It introduces a unified evaluation framework by decoupling quantization methods into two steps and systematically compares various settings and data formats.

Findings

01

Optimized rotation and scaling improve pre-quantization performance.

02

Combining low-rank compensation with GPTQ can outperform GPTQ alone.

03

Finer granularity enhances performance but increases storage overhead.

Abstract

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often evaluated under different settings because a method typically contains multiple components. Analyzing connections among existing methods is important for deeper understanding. To bridge these gaps, we conduct an extensive review of state-of-the-art methods and perform comprehensive evaluations under the same conditions for fair comparison. To our knowledge, such a fair and extensive investigation remains critically underexplored. To better understand connections, first, we decouple published quantization methods into two steps: pre-quantization transformation and quantization error mitigation. The former is a preprocessing step that reduces outlier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research