Deep learning with Elastic Averaging SGD
Sixin Zhang, Anna Choromanska, Yann LeCun

TL;DR
This paper introduces Elastic Averaging SGD, a novel parallel stochastic optimization algorithm for deep learning that enhances exploration and communication efficiency, leading to faster training and better performance.
Contribution
The paper proposes Elastic Averaging SGD with synchronous and asynchronous variants, providing stability analysis and demonstrating improved training speed and communication efficiency in deep neural networks.
Findings
Accelerates training of deep neural networks compared to baseline methods.
Enables more exploration by local workers, improving performance in deep learning.
Offers a stable asynchronous variant with proven stability conditions.
Abstract
We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
MethodsAlternating Direction Method of Multipliers
