UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs
Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam

TL;DR
UnIT introduces a lightweight, input-adaptive unstructured pruning method for neural networks on microcontrollers, significantly reducing MAC operations and energy consumption without retraining.
Contribution
It presents UnIT, a novel inference-time pruning approach that leverages irregular sparsity and input-specific activation patterns for efficient MCU deployment without retraining.
Findings
Achieves up to 82% MAC reduction on MSP430 microcontroller.
Reduces inference time by up to 84% and energy consumption by up to 84%.
Maintains accuracy within 7% of original models under pruning.
Abstract
Existing pruning methods are typically applied during training or compile time and often rely on structured sparsity. While compatible with low-power microcontrollers (MCUs), structured pruning underutilizes the opportunity for fine-grained efficiency on devices without SIMD support or parallel compute. To address these limitations, we introduce UnIT (Unstructured Inference-Time pruning), a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference, guided by input-specific activation patterns. Unlike structured pruning, UnIT embraces irregular sparsity and does not require retraining or hardware specialization. It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions. UnIT further optimizes compute by reusing threshold computations across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
