Neural Network Quantization for Efficient Inference: A Survey
Olivia Weng

TL;DR
This survey reviews recent neural network quantization techniques aimed at reducing model size and complexity to enable efficient deployment on resource-constrained devices, and discusses future research directions.
Contribution
It provides a comprehensive overview and comparison of neural network quantization methods developed over the past decade and suggests future research directions.
Findings
Quantization techniques vary in accuracy and efficiency.
Certain methods are more suitable for specific hardware constraints.
The survey identifies gaps and opportunities for future research.
Abstract
As neural networks have become more powerful, there has been a rising desire to deploy them in the real world; however, the power and accuracy of neural networks is largely due to their depth and complexity, making them difficult to deploy, especially in resource-constrained devices. Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks by reducing the precision of a network. With smaller and simpler networks, it becomes possible to run neural networks within the constraints of their target hardware. This paper surveys the many neural network quantization techniques that have been developed in the last decade. Based on this survey and comparison of neural network quantization techniques, we propose future directions of research in the area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Advanced Neural Network Applications
