Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

Jiaqi Zhao; Ming Wang; Miao Zhang; Yuzhang Shang; Xuebo Liu; Yaowei Wang; Min Zhang; Liqiang Nie

arXiv:2502.13178·cs.LG·May 22, 2025

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, Yaowei Wang, Min Zhang, Liqiang Nie

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive benchmark for post-training quantization (PTQ) in large language models, analyzing various strategies, their trade-offs, and providing practical recommendations for deployment and future research.

Contribution

It proposes a detailed taxonomy of PTQ methods, conducts extensive experiments across models and modalities, and offers insights into the strengths and trade-offs of different PTQ strategies.

Findings

01

Compensation-based techniques show strong cross-architecture robustness.

02

Ultra low-bit PTQ for large models needs reexamination.

03

Combining PTQ strategies can achieve state-of-the-art robustness.

Abstract

Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression owing to its efficiency and low resource requirement. However, current research lacks a in-depth analysis of the superior and applicable scenarios of each PTQ strategy. In addition, existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth. To mitigate these confusions, we provide a novel benchmark for LLMs PTQ in this paper. Firstly, in order to support our benchmark, we propose a comprehensive taxonomy for existing mainstream methods by scrutinizing their computational strategies (e.g., optimization-based, compensation-based, etc.). Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjq0455/PTQ-Bench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiology practices and education · Higher Education Learning Practices

MethodsFocus