Accumulated Gradient Normalization
Joeri Hermans, Gerasimos Spanakis, Rico M\"ockel

TL;DR
This paper introduces a novel distributed optimizer called Accumulated Gradient Normalization that improves asynchronous data parallel optimization by normalizing gradients, reducing staleness effects, and enhancing convergence rates.
Contribution
It proposes a new optimizer that normalizes gradients to better align worker contributions and mitigate implicit momentum, improving asynchronous training stability and efficiency.
Findings
Achieves better convergence rates than EASGD and DynSGD.
Effectively mitigates parameter staleness in asynchronous optimization.
Empirically outperforms existing optimizers in distributed settings.
Abstract
This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
