Quantized Neural Networks: Characterization and Holistic Optimization
Yoonho Boo, Sungho Shin, and Wonyong Sung

TL;DR
This paper presents a comprehensive approach to optimize quantized neural networks by considering architecture design and training methods, revealing how model structure affects quantization sensitivity and resilience.
Contribution
It introduces a holistic optimization framework for QDNNs, integrating architecture design with training, and visualizes quantization effects on different model structures.
Findings
Deeper models are more sensitive to activation quantization.
Wider models enhance robustness to weight and activation quantization.
Holistic optimization improves QDNN performance.
Abstract
Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Therefore, the model selection needs to be a part of the QDNN design process. Also, the characteristics of weight and activation quantization are quite different. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Synthesized data is used to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. This study can provide insight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
