Leveraging Automated Mixed-Low-Precision Quantization for tiny edge   microcontrollers

Manuele Rusci; Marco Fariselli; Alessandro Capotondi; Luca Benini

arXiv:2008.05124·cs.LG·August 13, 2020

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

Manuele Rusci, Marco Fariselli, Alessandro Capotondi, Luca Benini

PDF

TL;DR

This paper introduces an automated mixed-precision quantization method using reinforcement learning to optimize deep neural network deployment on tiny microcontrollers with strict memory limits, achieving high accuracy with uniform quantization.

Contribution

It presents a novel RL-based framework tailored for MCU memory constraints that finds optimal mixed-precision quantization policies for DNNs.

Findings

01

Achieves comparable accuracy to non-uniform quantization methods.

02

Improves MobileNetV1 accuracy by 4% within memory limits.

03

Demonstrates the viability of uniform quantization for deep weights on MCUs.

Abstract

The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPointwise Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Softmax · MobileNetV1 · Average Pooling · Convolution · Sigmoid Activation · Depthwise Separable Convolution · Squeeze-and-Excitation Block