Low-bit Model Quantization for Deep Neural Networks: A Survey

Kai Liu; Qian Zheng; Kaiwen Tao; Zhiteng Li; Haotong Qin; Wenbo Li; Yong Guo; Xianglong Liu; Linghe Kong; Guihai Chen; Yulun Zhang; Xiaokang Yang

arXiv:2505.05530·cs.LG·May 12, 2025

Low-bit Model Quantization for Deep Neural Networks: A Survey

Kai Liu, Qian Zheng, Kaiwen Tao, Zhiteng Li, Haotong Qin, Wenbo Li, Yong Guo, Xianglong Liu, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

PDF

Open Access

TL;DR

This survey reviews recent advances in low-bit quantization techniques for deep neural networks, highlighting methods to reduce model size and computation while addressing performance trade-offs.

Contribution

It classifies and compares state-of-the-art low-bit quantization methods, providing a comprehensive overview and identifying future research directions.

Findings

01

Classification of quantization methods into 8 main categories

02

Comparison of techniques based on core principles

03

Identification of research gaps and opportunities

Abstract

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an effective weight-lighting technique, has become an indispensable procedure in the whole deployment pipeline. The essence of quantization acceleration is the conversion from continuous floating-point numbers to discrete integer ones, which significantly speeds up the memory I/O and calculation, i.e., addition and multiplication. However, performance degradation also comes with the conversion because of the loss of precision. Therefore, it has become increasingly popular and critical to investigate how to perform the conversion and how to compensate for the information loss. This article surveys the recent five-year progress towards low-bit quantization on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Data Compression Techniques