Hardware-friendly Deep Learning by Network Quantization and Binarization
Haotong Qin

TL;DR
This paper explores network quantization and binarization techniques to make deep learning models more efficient and hardware-friendly, especially for resource-limited devices, by addressing challenges across diverse architectures and complex scenes.
Contribution
It provides a comprehensive analysis of quantization challenges and pushes the limits of network compression and acceleration for practical, real-world applications.
Findings
Quantization can significantly reduce model size and computation.
Extremely compressed networks maintain acceptable accuracy.
Addressing diverse architectures broadens applicability.
Abstract
Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize challenges of quantization into two categories: Quantization for Diverse Architectures and Quantization on Complex Scenes. Our studies focus mainly on applying quantization on various architectures and scenes and pushing the limit of quantization to extremely compress and accelerate networks. The comprehensive research on quantization will achieve more powerful, more efficient, and more flexible hardware-friendly deep learning, and make it better suited to more real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
