UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

Ashe Neth; Sawinder kaur; Mohammad Nur Hossain Khan; Subrata Biswas; Asif Salekin; Bashima Islam

arXiv:2507.07885·cs.LG·July 11, 2025

UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam

PDF

Open Access

TL;DR

UnIT introduces a lightweight, input-adaptive unstructured pruning method for neural networks on microcontrollers, significantly reducing MAC operations and energy consumption without retraining.

Contribution

It presents UnIT, a novel inference-time pruning approach that leverages irregular sparsity and input-specific activation patterns for efficient MCU deployment without retraining.

Findings

01

Achieves up to 82% MAC reduction on MSP430 microcontroller.

02

Reduces inference time by up to 84% and energy consumption by up to 84%.

03

Maintains accuracy within 7% of original models under pruning.

Abstract

Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques