MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li

TL;DR
MCU-MixQ is a hardware/software co-designed framework that optimizes mixed-precision neural networks for microcontrollers by leveraging SIMD packing and neural architecture search, significantly improving processing speed under resource constraints.
Contribution
The paper introduces a novel SIMD packing technique and a co-optimized NAS-based framework for efficient mixed-precision neural network deployment on MCUs.
Findings
Achieves 2.1× speedup over CMix-NN
Achieves 1.4× speedup over MCUNet
Effectively balances neural network accuracy and performance
Abstract
Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and computing of MCUs. Nevertheless, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA and the limited computing capability of MCUs remains underutilized, which further aggravates the computing bound encountered in neural network processing. As a result, the benefits of MPNNs cannot be fully unleashed. In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs, and then develop an efficient convolution operator by exploring both the data parallelism and computing parallelism in convolution along with the proposed SIMD packing. Finally, we further leverage Neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsMessage Passing Neural Network · Convolution
