msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
Zhaolan Huang, Emmanuel Baccelli

TL;DR
This paper introduces msf-CNN, a novel patch-based fusion method for CNNs that optimizes data flow, reducing RAM usage by 50% on microcontrollers, enhancing real-time inference capabilities in TinyML applications.
Contribution
msf-CNN is a new technique that efficiently explores fusion configurations in CNNs, providing more solutions and significantly reducing memory usage on microcontrollers.
Findings
Achieves 50% less RAM usage compared to prior CNN fusion methods.
Supports implementation on various microcontrollers including ARM Cortex-M, RISC-V, ESP32.
Offers increased flexibility for TinyML system design.
Abstract
AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must remain small to fit real-time constraints. An approach to tackle this is patch-based fusion, which aims to optimize data flows across neural network layers. In this paper, we introduce msf-CNN, a novel technique that efficiently finds optimal fusion settings for convolutional neural networks (CNNs) by walking through the fusion solution space represented as a directed acyclic graph. Compared to previous work on CNN fusion for MCUs, msf-CNN identifies a wider set of solutions. We published an implementation of msf-CNN running on various microcontrollers (ARM Cortex-M, RISC-V, ESP32). We show that msf-CNN can achieve inference using 50% less RAM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Parallel Computing and Optimization Techniques
MethodsSparse Evolutionary Training
