Poor Man's Training on MCUs: A Memory-Efficient Quantized   Back-Propagation-Free Approach

Yequan Zhao; Hai Li; Ian Young; Zheng Zhang

arXiv:2411.05873·cs.LG·November 12, 2024

Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Yequan Zhao, Hai Li, Ian Young, Zheng Zhang

PDF

Open Access

TL;DR

This paper introduces a memory-efficient, BP-free training method for neural networks on microcontrollers, using quantized zeroth-order gradient estimation and dimension reduction to enable effective training on resource-constrained edge devices.

Contribution

It proposes a novel BP-free training scheme with quantized zeroth-order methods and dimension reduction techniques suitable for microcontrollers, simplifying hardware design and reducing resource requirements.

Findings

01

Achieves comparable performance to traditional BP-based training on corrupted image data.

02

Enables dense full-model training on MCUs with 1024-KB SRAM.

03

Supports sparse training on MCUs with 256-KB SRAM.

Abstract

Back propagation (BP) is the default solution for gradient computation in neural network training. However, implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs), and analog computing platforms face multiple major challenges, such as the lack of hardware resources, long time-to-market, and dramatic errors in a low-precision setting. This paper presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design. We adopt a quantized zeroth-order method to estimate the gradients of quantized model parameters, which can overcome the error of a straight-through estimator in a low-precision BP scheme. We further employ a few dimension reduction methods (e.g., node perturbation, sparse training) to improve the convergence of zeroth-order training. Experiment results show that our BP-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and ELM · Neural Networks and Applications

MethodsADaptive gradient method with the OPTimal convergence rate