Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance   Reduction

Prashant Khanduri; Pranay Sharma; Swatantra Kafle; Saikiran Bulusu,; Ketan Rajawat; Pramod K. Varshney

arXiv:2005.00224·math.OC·May 4, 2020·1 cites

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

Prashant Khanduri, Pranay Sharma, Swatantra Kafle, Saikiran Bulusu,, Ketan Rajawat, Pramod K. Varshney

PDF

Open Access

TL;DR

This paper introduces a momentum-based distributed stochastic optimization algorithm for non-convex functions, achieving optimal complexity and linear speedup without large batch gradients, suitable for federated learning.

Contribution

The paper presents a novel single-loop momentum-based distributed algorithm with adaptive and non-adaptive learning rates, eliminating the need for large batch gradients and handling non-i.i.d. data.

Findings

01

Achieves optimal computational complexity for non-convex optimization.

02

Attains linear speedup with the number of worker nodes.

03

Does not require data to be identically distributed across nodes.

Abstract

In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based "single loop" distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with "adaptive" and the other with "non-adaptive" learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques