Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
Georg Rutishauser, Francesco Conti, Luca Benini

TL;DR
This paper introduces a hybrid search method for mixed-precision neural network quantization, optimizing latency on edge hardware with minimal accuracy loss, demonstrated on RISC-V microcontrollers with significant latency reductions.
Contribution
It proposes a hardware-aware differentiable and heuristic hybrid search algorithm for latency-optimized mixed-precision quantization tailored to specific hardware.
Findings
Up to 28.6% latency reduction on MobileNets with negligible accuracy loss.
Effective on diverse RISC-V microcontroller platforms.
Outperforms binary operation count-based search methods.
Abstract
Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification
MethodsPointwise Convolution · Inverted Residual Block · Softmax · Dense Connections · Global Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Average Pooling · Convolution · Batch Normalization
