Streamlined Deployment for Quantized Neural Networks

Yaman Umuroglu; Magnus Jahre

arXiv:1709.04060·cs.CV·May 31, 2018·28 cites

Streamlined Deployment for Quantized Neural Networks

Yaman Umuroglu, Magnus Jahre

PDF

Open Access 1 Repo

TL;DR

This paper presents a streamlined approach to deploying quantized neural networks on mobile CPUs by converting operations to integer form and using bit-serial processing, achieving significant speedups.

Contribution

It introduces a novel flow for converting QNN inference to integer operations and a bit-serial technique for efficient deployment on common CPU architectures.

Findings

01

QNN inference can be converted to integer operations for better compatibility.

02

Bit-serial processing enables efficient QNN deployment on mobile CPUs.

03

Quantized AlexNet runs 3.5x faster than an 8-bit baseline.

Abstract

Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge since operations on highly quantized datatypes are not natively supported in most instruction set architectures (ISAs). In this work, we first describe a streamlining flow to convert all QNN inference operations to integer ones. Afterwards, we provide techniques based on processing one bit position at a time (bit-serial) to show how QNNs can be efficiently deployed using common bitwise operations. We demonstrate the potential of QNNs on mobile CPUs with microbenchmarks and on a quantized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EECS-NTNU/bismo
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices