Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach
Yequan Zhao, Hai Li, Ian Young, Zheng Zhang

TL;DR
This paper introduces a memory-efficient, BP-free training method for neural networks on microcontrollers, using quantized zeroth-order gradient estimation and dimension reduction to enable effective training on resource-constrained edge devices.
Contribution
It proposes a novel BP-free training scheme with quantized zeroth-order methods and dimension reduction techniques suitable for microcontrollers, simplifying hardware design and reducing resource requirements.
Findings
Achieves comparable performance to traditional BP-based training on corrupted image data.
Enables dense full-model training on MCUs with 1024-KB SRAM.
Supports sparse training on MCUs with 256-KB SRAM.
Abstract
Back propagation (BP) is the default solution for gradient computation in neural network training. However, implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs), and analog computing platforms face multiple major challenges, such as the lack of hardware resources, long time-to-market, and dramatic errors in a low-precision setting. This paper presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design. We adopt a quantized zeroth-order method to estimate the gradients of quantized model parameters, which can overcome the error of a straight-through estimator in a low-precision BP scheme. We further employ a few dimension reduction methods (e.g., node perturbation, sparse training) to improve the convergence of zeroth-order training. Experiment results show that our BP-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and ELM · Neural Networks and Applications
MethodsADaptive gradient method with the OPTimal convergence rate
