On-Device Training Under 256KB Memory

Ji Lin; Ligeng Zhu; Wei-Ming Chen; Wei-Chen Wang; Chuang Gan; Song Han

arXiv:2206.15472·cs.CV·April 4, 2024·71 cites

On-Device Training Under 256KB Memory

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper introduces a novel algorithm-system co-design framework enabling on-device training of neural networks within 256KB memory, facilitating privacy-preserving, adaptive AI on tiny IoT devices.

Contribution

It presents Quantization-Aware Scaling and Sparse Update techniques, along with Tiny Training Engine, to enable efficient, low-memory on-device training without auxiliary memory.

Findings

01

Supports training under 256KB SRAM and 1MB Flash

02

Achieves comparable accuracy to larger models on tinyML tasks

03

Uses less than 1/1000 of PyTorch and TensorFlow memory

Abstract

On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-han-lab/mcunet
tf

Videos

[Demo] On-Device Training Under 256KB Memory· youtube

On-Device Training Under 256KB Memory· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM