Towards Exact Gradient-based Training on Analog In-memory Computing
Zhaoxian Wu, Tayfun Gokmen, Malte J. Rasch, Tianyi Chen

TL;DR
This paper develops a theoretical foundation for gradient-based training on analog in-memory devices, addressing convergence issues caused by device asymmetries and proposing a new algorithm that guarantees exact convergence.
Contribution
It characterizes the convergence problem of SGD on analog devices, establishes a fundamental error bound, and introduces Tiki-Taka, an algorithm with provable exact convergence.
Findings
SGD converges inexactly due to asymmetric updates on analog devices.
A lower bound of asymptotic error shows a fundamental performance limit.
Tiki-Taka guarantees exact convergence to a critical point.
Abstract
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Applications
MethodsStochastic Gradient Descent
