Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction
Prashant Khanduri, Pranay Sharma, Swatantra Kafle, Saikiran Bulusu,, Ketan Rajawat, Pramod K. Varshney

TL;DR
This paper introduces a momentum-based distributed stochastic optimization algorithm for non-convex functions, achieving optimal complexity and linear speedup without large batch gradients, suitable for federated learning.
Contribution
The paper presents a novel single-loop momentum-based distributed algorithm with adaptive and non-adaptive learning rates, eliminating the need for large batch gradients and handling non-i.i.d. data.
Findings
Achieves optimal computational complexity for non-convex optimization.
Attains linear speedup with the number of worker nodes.
Does not require data to be identically distributed across nodes.
Abstract
In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based "single loop" distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with "adaptive" and the other with "non-adaptive" learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
