On-Device Training Under 256KB Memory
Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

TL;DR
This paper introduces a novel algorithm-system co-design framework enabling on-device training of neural networks within 256KB memory, facilitating privacy-preserving, adaptive AI on tiny IoT devices.
Contribution
It presents Quantization-Aware Scaling and Sparse Update techniques, along with Tiny Training Engine, to enable efficient, low-memory on-device training without auxiliary memory.
Findings
Supports training under 256KB SRAM and 1MB Flash
Achieves comparable accuracy to larger models on tinyML tasks
Uses less than 1/1000 of PyTorch and TensorFlow memory
Abstract
On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
