Accelerate Distributed Stochastic Descent for Nonconvex Optimization   with Momentum

Guojing Cong; Tianyi Liu

arXiv:2110.00625·cs.LG·October 5, 2021

Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

Guojing Cong, Tianyi Liu

PDF

Open Access

TL;DR

This paper introduces a block momentum method for distributed nonconvex optimization that accelerates training and improves results by combining local stochastic gradients with global momentum at the model averaging level.

Contribution

It proposes a novel block momentum technique for distributed stochastic descent, analyzing its convergence and demonstrating its effectiveness in deep learning training.

Findings

01

Block momentum accelerates training speed.

02

It achieves better model performance.

03

The method scales well with distributed systems.

Abstract

Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM