MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

TL;DR
MCUNetV2 introduces a patch-based inference method combined with neural architecture search to significantly reduce memory usage in tiny deep learning on microcontrollers, enabling high accuracy and new vision applications.
Contribution
It proposes a novel patch-by-patch inference scheduling and neural architecture search to optimize memory and computation for tiny deep learning models.
Findings
Reduces peak memory usage by 4-8x on existing networks.
Achieves 71.8% ImageNet accuracy on MCU with 32kB SRAM.
Improves Pascal VOC mAP by 16.9% over state-of-the-art.
Abstract
Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult. We automate the process with neural architecture search to jointly optimize the neural architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
