Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution
Shiming Li, Luca Mottola, Yuan Yao, Stefanos Kaxiras

TL;DR
This paper introduces saturation-aware convolution, a technique that rearranges computations in CNNs to detect early saturation, reducing unnecessary calculations on ultra-low-power MCUs without affecting accuracy.
Contribution
It proposes a novel inference method that induces early saturation in CNN computations, significantly reducing inference time on low-power microcontrollers.
Findings
Up to 24% inference time reduction on Cortex-M0+ MCU
Zero accuracy loss with the proposed method
Effective early saturation detection in quantized CNNs
Abstract
Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times, the neuron can save time by only producing a value that is extreme enough to lead to the clamped result, instead of completing the computation, yet without introducing any error. Based on this, we present saturation-aware convolution: an inference technique whereby we alter the order of computations in convolution kernels to induce earlier saturation, and value checks are inserted to omit unnecessary computations when the intermediate result is sufficiently extreme. Our experimental results display up to 24% inference time saving on a Cortex-M0+ MCU, with zero impact on accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Low-power high-performance VLSI design
