MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design   Framework for MCUs

Junfeng Gong; Cheng Liu; Long Cheng; Huawei Li; Xiaowei Li

arXiv:2407.18267·cs.AR·July 29, 2024

MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs

Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li

PDF

Open Access

TL;DR

MCU-MixQ is a hardware/software co-designed framework that optimizes mixed-precision neural networks for microcontrollers by leveraging SIMD packing and neural architecture search, significantly improving processing speed under resource constraints.

Contribution

The paper introduces a novel SIMD packing technique and a co-optimized NAS-based framework for efficient mixed-precision neural network deployment on MCUs.

Findings

01

Achieves 2.1× speedup over CMix-NN

02

Achieves 1.4× speedup over MCUNet

03

Effectively balances neural network accuracy and performance

Abstract

Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and computing of MCUs. Nevertheless, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA and the limited computing capability of MCUs remains underutilized, which further aggravates the computing bound encountered in neural network processing. As a result, the benefits of MPNNs cannot be fully unleashed. In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs, and then develop an efficient convolution operator by exploring both the data parallelism and computing parallelism in convolution along with the proposed SIMD packing. Finally, we further leverage Neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsMessage Passing Neural Network · Convolution