Deep learning with Elastic Averaging SGD

Sixin Zhang; Anna Choromanska; Yann LeCun

arXiv:1412.6651·cs.LG·October 27, 2015·ICLR·67 cites

Deep learning with Elastic Averaging SGD

Sixin Zhang, Anna Choromanska, Yann LeCun

PDF

Open Access 5 Repos

TL;DR

This paper introduces Elastic Averaging SGD, a novel parallel stochastic optimization algorithm for deep learning that enhances exploration and communication efficiency, leading to faster training and better performance.

Contribution

The paper proposes Elastic Averaging SGD with synchronous and asynchronous variants, providing stability analysis and demonstrating improved training speed and communication efficiency in deep neural networks.

Findings

01

Accelerates training of deep neural networks compared to baseline methods.

02

Enables more exploration by local workers, improving performance in deep learning.

03

Offers a stable asynchronous variant with proven stability conditions.

Abstract

We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm. We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsAlternating Direction Method of Multipliers