Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network   Inference On Microcontrollers

Manuele Rusci; Alessandro Capotondi; Luca Benini

arXiv:1905.13082·cs.LG·May 31, 2019·27 cites

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

Manuele Rusci, Alessandro Capotondi, Luca Benini

PDF

Open Access 2 Repos

TL;DR

This paper introduces a memory-efficient mixed low-bitwidth quantization method for deploying deep neural networks on microcontrollers, achieving high accuracy with minimal memory usage through an iterative, rule-based quantization and retraining process.

Contribution

It proposes a novel end-to-end approach combining mixed low-bit quantization, rule-based bit reduction, and integer-only inference modeling for microcontroller deployment.

Findings

01

Achieved 68% Top1 accuracy on MobilenetV1 with 2MB flash and 512KB RAM.

02

Reduced memory footprint enabling deployment of deep networks on microcontrollers.

03

Improved accuracy by 8% over previous 8-bit implementations.

Abstract

This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations. Our approach aims at determining the minimum bit precision of every activation and weight tensor given the memory constraints of a device. This is achieved through a rule-based iterative procedure, which cuts the number of bits of the most memory-demanding layers, aiming at meeting the memory constraints. After a quantization-aware retraining step, the fake-quantized graph is converted into an inference integer-only model by inserting the Integer Channel-Normalization (ICN) layers, which introduce a negligible loss as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Wireless Signal Modulation Classification

MethodsDepthwise Convolution · Pointwise Convolution · Average Pooling · Global Average Pooling · Depthwise Separable Convolution · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dense Connections · Softmax