Development of Quantized DNN Library for Exact Hardware Emulation
Masato Kiyama, Motoki Amagasaki, Masahiro Iida

TL;DR
This paper introduces PyParch, a library that accurately emulates quantized deep neural networks on hardware, enabling precise overflow detection and performance estimation for complex models like YOLOv5.
Contribution
PyParch provides exact hardware behavior emulation for quantized DNNs, including overflow detection, which was not previously available in existing libraries.
Findings
PyParch accurately estimates QNN precision for large DNNs.
Overflow detection is successfully integrated into the emulation.
Emulation overhead is 5.6x for QNNs and 42x with overflow detection.
Abstract
Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Data Storage Technologies
