On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training
Zhaoxian Wu, Quan Xiao, Tayfun Gokmen, Hsinyu Tsai, Kaoutar El Maghraoui, Tianyi Chen

TL;DR
This paper analyzes the convergence of asynchronous pipeline gradient-based training on analog in-memory computing hardware, demonstrating that it converges efficiently despite hardware imperfections and stale weights.
Contribution
It provides the first theoretical convergence analysis of asynchronous pipeline SGD on AIMC hardware, showing near-equivalent efficiency to digital SGD.
Findings
Analog-SGD-AP converges with complexity $O(rac{1}{ ext{ε}^2} + rac{1}{ ext{ε}})$
Asynchronous pipelining overlaps computation, nearly matching synchronous pipeline efficiency
Hardware imperfections and stale weights do not significantly hinder convergence
Abstract
Aiming to accelerate the training of large deep neural networks (DNN) in an energy-efficient way, analog in-memory computing (AIMC) emerges as a solution with immense potential. AIMC accelerator keeps model weights in memory without moving them from memory to processors during training, reducing overhead dramatically. Despite its efficiency, scaling up AIMC systems presents significant challenges. Since weight copying is expensive and inaccurate, data parallelism is less efficient on AIMC accelerators. It necessitates the exploration of pipeline parallelism, particularly asynchronous pipeline parallelism, which utilizes all available accelerators during the training process. This paper examines the convergence theory of stochastic gradient descent on AIMC hardware with an asynchronous pipeline (Analog-SGD-AP). Although there is empirical exploration of AIMC accelerators, the theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
