Distributed stochastic optimization for deep learning (thesis)
Sixin Zhang

TL;DR
This thesis introduces Elastic Averaging SGD (EASGD), a distributed stochastic optimization method for deep learning that improves training speed, accuracy, and communication efficiency, with theoretical analysis and empirical validation on large datasets.
Contribution
The paper proposes EASGD, analyzes its convergence and stability, and demonstrates its advantages over existing methods like DOWNPOUR, including scalability and reduced communication.
Findings
EASGD accelerates training and improves test accuracy.
EASGD requires less communication than baseline methods.
The spread of data impacts convergence and stability.
Abstract
We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the existing ADMM method in the round-robin scheme. An asynchronous and momentum variant of the EASGD method is applied to train deep convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Our approach accelerates the training and furthermore achieves better test accuracy. It also requires a much smaller amount of communication than other common baseline approaches such as the DOWNPOUR method. We then investigate the limit in speedup of the initial and the asymptotic phase of the mini-batch SGD, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data
MethodsAlternating Direction Method of Multipliers · Stochastic Gradient Descent
