Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, Yaowei Wang, Min Zhang, Liqiang Nie

TL;DR
This paper introduces a comprehensive benchmark for post-training quantization (PTQ) in large language models, analyzing various strategies, their trade-offs, and providing practical recommendations for deployment and future research.
Contribution
It proposes a detailed taxonomy of PTQ methods, conducts extensive experiments across models and modalities, and offers insights into the strengths and trade-offs of different PTQ strategies.
Findings
Compensation-based techniques show strong cross-architecture robustness.
Ultra low-bit PTQ for large models needs reexamination.
Combining PTQ strategies can achieve state-of-the-art robustness.
Abstract
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression owing to its efficiency and low resource requirement. However, current research lacks a in-depth analysis of the superior and applicable scenarios of each PTQ strategy. In addition, existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth. To mitigate these confusions, we provide a novel benchmark for LLMs PTQ in this paper. Firstly, in order to support our benchmark, we propose a comprehensive taxonomy for existing mainstream methods by scrutinizing their computational strategies (e.g., optimization-based, compensation-based, etc.). Then, we conduct extensive experiments with the baseline within each class, covering models with various sizes (7B-70B), bitwidths, training levels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Higher Education Learning Practices
MethodsFocus
