On the Linear Speedup Analysis of Communication Efficient Momentum SGD   for Distributed Non-Convex Optimization

Hao Yu; Rong Jin; Sen Yang

arXiv:1905.03817·math.OC·May 13, 2019·147 cites

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Hao Yu, Rong Jin, Sen Yang

PDF

Open Access

TL;DR

This paper proves that a communication-efficient distributed momentum SGD method maintains linear speedup in non-convex optimization, enabling scalable training of deep neural networks with reduced communication overhead.

Contribution

It introduces and analyzes a distributed momentum SGD algorithm that achieves linear speedup and reduced communication complexity in non-convex settings.

Findings

01

Proves linear speedup property for the proposed method.

02

Demonstrates reduced communication complexity.

03

Supports scalable training of deep neural networks.

Abstract

Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent (SGD). A series of recent works study the linear speedup property of distributed SGD variants with reduced communication. The linear speedup property enable us to scale out the computing capability by adding more computing nodes into our system. The reduced communication complexity is desirable since communication overhead is often the performance bottleneck in distributed systems. Recently, momentum methods are more and more widely adopted in training machine learning models and can often converge faster and generalize better. For example, many practitioners use distributed SGD with momentum to train deep neural networks with big data. However, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques

MethodsSGD with Momentum · Stochastic Gradient Descent