ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
Keisuke Sugiura, Hiroki Matsutani

TL;DR
ElasticZO introduces a hybrid zeroth- and first-order optimization method for on-device neural network training, significantly reducing memory usage and training time while maintaining high accuracy, suitable for edge devices.
Contribution
The paper proposes ElasticZO, a novel hybrid ZO and BP approach for on-device training, including the first integer-only ZO training method, improving efficiency and accuracy.
Findings
ElasticZO achieves 5.2-9.5% higher accuracy than vanilla ZO.
Memory overhead is reduced to 0.072-1.7%.
ElasticZO-INT8 reduces training time by 1.38-1.42x.
Abstract
Zeroth-order (ZO) optimization is being recognized as a simple yet powerful alternative to standard backpropagation (BP)-based training. Notably, ZO optimization allows for training with only forward passes and (almost) the same memory as inference, making it well-suited for edge devices with limited computing and memory resources. In this paper, we propose ZO-based on-device learning (ODL) methods for full-precision and 8-bit quantized deep neural networks (DNNs), namely ElasticZO and ElasticZO-INT8. ElasticZO lies in the middle between pure ZO- and pure BP-based approaches, and is based on the idea to employ BP for the last few layers and ZO for the remaining layers. ElasticZO-INT8 achieves integer arithmetic-only ZO-based training for the first time, by incorporating a novel method for computing quantized ZO gradients from integer cross-entropy loss values. Experimental results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Machine Learning and ELM · Neural Networks and Applications
