Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu; Tayfun Gokmen; Malte J. Rasch; Tianyi Chen

arXiv:2406.12774·cs.LG·June 19, 2024

Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu, Tayfun Gokmen, Malte J. Rasch, Tianyi Chen

PDF

Open Access

TL;DR

This paper develops a theoretical foundation for gradient-based training on analog in-memory devices, addressing convergence issues caused by device asymmetries and proposing a new algorithm that guarantees exact convergence.

Contribution

It characterizes the convergence problem of SGD on analog devices, establishes a fundamental error bound, and introduces Tiki-Taka, an algorithm with provable exact convergence.

Findings

01

SGD converges inexactly due to asymmetric updates on analog devices.

02

A lower bound of asymptotic error shows a fundamental performance limit.

03

Tiki-Taka guarantees exact convergence to a critical point.

Abstract

Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Applications

MethodsStochastic Gradient Descent