Development of Quantized DNN Library for Exact Hardware Emulation

Masato Kiyama; Motoki Amagasaki; Masahiro Iida

arXiv:2106.08892·cs.LG·June 17, 2021

Development of Quantized DNN Library for Exact Hardware Emulation

Masato Kiyama, Motoki Amagasaki, Masahiro Iida

PDF

Open Access

TL;DR

This paper introduces PyParch, a library that accurately emulates quantized deep neural networks on hardware, enabling precise overflow detection and performance estimation for complex models like YOLOv5.

Contribution

PyParch provides exact hardware behavior emulation for quantized DNNs, including overflow detection, which was not previously available in existing libraries.

Findings

01

PyParch accurately estimates QNN precision for large DNNs.

02

Overflow detection is successfully integrated into the emulation.

03

Emulation overhead is 5.6x for QNNs and 42x with overflow detection.

Abstract

Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Data Storage Technologies