Scaling of hardware-compatible perturbative training algorithms
Bakhrom G. Oripov, Andrew Dienstfrey, Adam N. McCaughan and, Sonia M. Buckley

TL;DR
This paper demonstrates that multiplexed gradient descent (MGD), a perturbative training method, scales efficiently with network size, enabling hardware-compatible training of large neural networks with accuracy comparable to backpropagation.
Contribution
The work extends MGD to include weight and node perturbation, showing it scales well and can replace traditional gradients in stochastic gradient descent for hardware-efficient training.
Findings
MGD's training time does not scale linearly with network size.
MGD achieves accuracy comparable to backpropagation on large networks.
MGD is compatible with optimization accelerators like momentum.
Abstract
In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation, and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems, since in these methods the time to estimate the gradient scales linearly with the number of network parameters. However, in this work we show that the time to reach a target accuracy--that is, actually solve the problem of interest--does not follow this undesirable linear scaling, and in fact often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
