Gradient Energy Matching for Distributed Asynchronous Gradient Descent

Joeri Hermans; Gilles Louppe

arXiv:1805.08469·cs.LG·May 23, 2018·5 cites

Gradient Energy Matching for Distributed Asynchronous Gradient Descent

Joeri Hermans, Gilles Louppe

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel energy-based framework for analyzing and stabilizing distributed asynchronous SGD, leading to a new method called GEM that improves stability, speed, and generalization in large-scale deep learning.

Contribution

It proposes an energy-based stability criterion for asynchronous SGD and develops GEM, a method that maintains system energy below a target, enhancing stability and performance.

Findings

01

GEM achieves greater stability than existing methods.

02

GEM scales effectively to 100 workers.

03

GEM shows improved generalization over targeted SGD with momentum.

Abstract

Distributed asynchronous SGD has become widely used for deep learning in large-scale systems, but remains notorious for its instability when increasing the number of workers. In this work, we study the dynamics of distributed asynchronous SGD under the lens of Lagrangian mechanics. Using this description, we introduce the concept of energy to describe the optimization process and derive a sufficient condition ensuring its stability as long as the collective energy induced by the active workers remains below the energy of a target synchronous process. Making use of this criterion, we derive a stable distributed asynchronous optimization procedure, GEM, that estimates and maintains the energy of the asynchronous system below or equal to the energy of sequential SGD with momentum. Experimental results highlight the stability and speedup of GEM compared to existing schemes, even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Advanced Neural Network Applications

MethodsStochastic Gradient Descent