Distributed stochastic optimization for deep learning (thesis)

Sixin Zhang

arXiv:1605.02216·cs.LG·May 10, 2016·1 cites

Distributed stochastic optimization for deep learning (thesis)

Sixin Zhang

PDF

Open Access

TL;DR

This thesis introduces Elastic Averaging SGD (EASGD), a distributed stochastic optimization method for deep learning that improves training speed, accuracy, and communication efficiency, with theoretical analysis and empirical validation on large datasets.

Contribution

The paper proposes EASGD, analyzes its convergence and stability, and demonstrates its advantages over existing methods like DOWNPOUR, including scalability and reduced communication.

Findings

01

EASGD accelerates training and improves test accuracy.

02

EASGD requires less communication than baseline methods.

03

The spread of data impacts convergence and stability.

Abstract

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the existing ADMM method in the round-robin scheme. An asynchronous and momentum variant of the EASGD method is applied to train deep convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Our approach accelerates the training and furthermore achieves better test accuracy. It also requires a much smaller amount of communication than other common baseline approaches such as the DOWNPOUR method. We then investigate the limit in speedup of the initial and the asymptotic phase of the mini-batch SGD, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data

MethodsAlternating Direction Method of Multipliers · Stochastic Gradient Descent