Split CNN Inference on Networked Microcontrollers

Junyu Lu; Shashwath Suresh; Hao Liu; Qi Hong; Qing Wang

arXiv:2605.09357·cs.DC·May 12, 2026

Split CNN Inference on Networked Microcontrollers

Junyu Lu, Shashwath Suresh, Hao Liu, Qi Hong, Qing Wang

PDF

1 Repo

TL;DR

This paper introduces a split inference system for CNNs on networked microcontrollers, enabling collaborative execution across multiple devices to overcome memory constraints while maintaining latency.

Contribution

It proposes a novel sub-layer splitting approach for CNN inference on MCUs, distributing model parameters and activations across devices to reduce peak RAM usage.

Findings

01

Enables CNN inference on multiple MCUs previously infeasible on a single device.

02

Reduces peak RAM usage per MCU while maintaining inference latency.

03

Successfully tested with MobileNetV2 on up to 8 MCUs.

Abstract

Running deep neural networks on microcontroller units (MCUs) is severely constrained by limited memory resources. While TinyML techniques reduce model size and computation, they often fail in practice due to excessive peak Random Access Memory (RAM) usage during inference, dominated by intermediate activations. As a result, many models remain infeasible on standalone MCUs. In this work, we present a fine-grained split inference system for networked MCUs that enables collaborative inference of Convolutional Neural Networks (CNN) models across multiple devices. Our key insight is that breaking the memory bottleneck requires splitting inference at sub-layer granularity rather than at layer boundaries. We reinterpret pre-trained models to enable kernel-wise and neuron-wise partitioning, and distribute both model parameters and intermediate activations across multiple MCUs. A lightweight,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shashsuresh/split-inference-on-MCUs
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.