Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Shiming Li; Luca Mottola; Yuan Yao; Stefanos Kaxiras

arXiv:2511.05347·eess.SY·February 27, 2026

Efficient CNN Inference on Ultra-Low-Power MCUs via Saturation-Aware Convolution

Shiming Li, Luca Mottola, Yuan Yao, Stefanos Kaxiras

PDF

Open Access

TL;DR

This paper introduces saturation-aware convolution, a technique that rearranges computations in CNNs to detect early saturation, reducing unnecessary calculations on ultra-low-power MCUs without affecting accuracy.

Contribution

It proposes a novel inference method that induces early saturation in CNN computations, significantly reducing inference time on low-power microcontrollers.

Findings

01

Up to 24% inference time reduction on Cortex-M0+ MCU

02

Zero accuracy loss with the proposed method

03

Effective early saturation detection in quantized CNNs

Abstract

Quantized CNN inference on ultra-low-power MCUs incurs unnecessary computations in neurons that produce saturated output values. These values are too extreme and are eventually clamped to the boundaries allowed by the neuron. Often times, the neuron can save time by only producing a value that is extreme enough to lead to the clamped result, instead of completing the computation, yet without introducing any error. Based on this, we present saturation-aware convolution: an inference technique whereby we alter the order of computations in convolution kernels to induce earlier saturation, and value checks are inserted to omit unnecessary computations when the intermediate result is sufficiently extreme. Our experimental results display up to 24% inference time saving on a Cortex-M0+ MCU, with zero impact on accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Low-power high-performance VLSI design