Streamlined Deployment for Quantized Neural Networks
Yaman Umuroglu, Magnus Jahre

TL;DR
This paper presents a streamlined approach to deploying quantized neural networks on mobile CPUs by converting operations to integer form and using bit-serial processing, achieving significant speedups.
Contribution
It introduces a novel flow for converting QNN inference to integer operations and a bit-serial technique for efficient deployment on common CPU architectures.
Findings
QNN inference can be converted to integer operations for better compatibility.
Bit-serial processing enables efficient QNN deployment on mobile CPUs.
Quantized AlexNet runs 3.5x faster than an 8-bit baseline.
Abstract
Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge since operations on highly quantized datatypes are not natively supported in most instruction set architectures (ISAs). In this work, we first describe a streamlining flow to convert all QNN inference operations to integer ones. Afterwards, we provide techniques based on processing one bit position at a time (bit-serial) to show how QNNs can be efficiently deployed using common bitwise operations. We demonstrate the potential of QNNs on mobile CPUs with microbenchmarks and on a quantized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
