Free Bits: Latency Optimization of Mixed-Precision Quantized Neural   Networks on the Edge

Georg Rutishauser; Francesco Conti; Luca Benini

arXiv:2307.02894·cs.LG·July 7, 2023

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Georg Rutishauser, Francesco Conti, Luca Benini

PDF

Open Access

TL;DR

This paper introduces a hybrid search method for mixed-precision neural network quantization, optimizing latency on edge hardware with minimal accuracy loss, demonstrated on RISC-V microcontrollers with significant latency reductions.

Contribution

It proposes a hardware-aware differentiable and heuristic hybrid search algorithm for latency-optimized mixed-precision quantization tailored to specific hardware.

Findings

01

Up to 28.6% latency reduction on MobileNets with negligible accuracy loss.

02

Effective on diverse RISC-V microcontroller platforms.

03

Outperforms binary operation count-based search methods.

Abstract

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification

MethodsPointwise Convolution · Inverted Residual Block · Softmax · Dense Connections · Global Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Average Pooling · Convolution · Batch Normalization