ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth-   and First-Order Optimization

Keisuke Sugiura; Hiroki Matsutani

arXiv:2501.04287·cs.LG·January 9, 2025

ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization

Keisuke Sugiura, Hiroki Matsutani

PDF

Open Access

TL;DR

ElasticZO introduces a hybrid zeroth- and first-order optimization method for on-device neural network training, significantly reducing memory usage and training time while maintaining high accuracy, suitable for edge devices.

Contribution

The paper proposes ElasticZO, a novel hybrid ZO and BP approach for on-device training, including the first integer-only ZO training method, improving efficiency and accuracy.

Findings

01

ElasticZO achieves 5.2-9.5% higher accuracy than vanilla ZO.

02

Memory overhead is reduced to 0.072-1.7%.

03

ElasticZO-INT8 reduces training time by 1.38-1.42x.

Abstract

Zeroth-order (ZO) optimization is being recognized as a simple yet powerful alternative to standard backpropagation (BP)-based training. Notably, ZO optimization allows for training with only forward passes and (almost) the same memory as inference, making it well-suited for edge devices with limited computing and memory resources. In this paper, we propose ZO-based on-device learning (ODL) methods for full-precision and 8-bit quantized deep neural networks (DNNs), namely ElasticZO and ElasticZO-INT8. ElasticZO lies in the middle between pure ZO- and pure BP-based approaches, and is based on the idea to employ BP for the last few layers and ZO for the remaining layers. ElasticZO-INT8 achieves integer arithmetic-only ZO-based training for the first time, by incorporating a novel method for computing quantized ZO gradients from integer cross-entropy loss values. Experimental results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Machine Learning and ELM · Neural Networks and Applications