Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
Manuele Rusci, Alessandro Capotondi, Luca Benini

TL;DR
This paper introduces a memory-efficient mixed low-bitwidth quantization method for deploying deep neural networks on microcontrollers, achieving high accuracy with minimal memory usage through an iterative, rule-based quantization and retraining process.
Contribution
It proposes a novel end-to-end approach combining mixed low-bit quantization, rule-based bit reduction, and integer-only inference modeling for microcontroller deployment.
Findings
Achieved 68% Top1 accuracy on MobilenetV1 with 2MB flash and 512KB RAM.
Reduced memory footprint enabling deployment of deep networks on microcontrollers.
Improved accuracy by 8% over previous 8-bit implementations.
Abstract
This paper presents a novel end-to-end methodology for enabling the deployment of low-error deep networks on microcontrollers. To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations. Our approach aims at determining the minimum bit precision of every activation and weight tensor given the memory constraints of a device. This is achieved through a rule-based iterative procedure, which cuts the number of bits of the most memory-demanding layers, aiming at meeting the memory constraints. After a quantization-aware retraining step, the fake-quantized graph is converted into an inference integer-only model by inserting the Integer Channel-Normalization (ICN) layers, which introduce a negligible loss as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Wireless Signal Modulation Classification
MethodsDepthwise Convolution · Pointwise Convolution · Average Pooling · Global Average Pooling · Depthwise Separable Convolution · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dense Connections · Softmax
