Decentralized Stochastic Proximal Gradient Descent with Variance   Reduction over Time-varying Networks

Xuanjie Li; Yuedong Xu; Jessie Hui Wang; Xin Wang; John C.S. Lui

arXiv:2112.10389·cs.LG·January 25, 2022·1 cites

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

Xuanjie Li, Yuedong Xu, Jessie Hui Wang, Xin Wang, John C.S. Lui

PDF

Open Access

TL;DR

This paper introduces DPSVRG, a decentralized stochastic proximal gradient method with variance reduction, significantly improving convergence speed over traditional DSPG in decentralized learning scenarios.

Contribution

The paper proposes DPSVRG, a novel decentralized algorithm that employs variance reduction to accelerate convergence in decentralized learning with non-smooth regularization.

Findings

01

DPSVRG achieves an $O(1/T)$ convergence rate for convex objectives.

02

DPSVRG converges faster than DSPG, with smoother loss reduction.

03

Experimental results confirm improved convergence across various network topologies.

Abstract

In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives, and incorporates a non-smooth regularization term for the better generalization ability. Decentralized stochastic proximal gradient (DSPG) method is commonly used to train this type of learning models, while the convergence rate is retarded by the variance of stochastic gradients. In this paper, we propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique. The basic idea is to introduce an estimator in each node, which tracks the local full gradient periodically, to correct the stochastic gradient at each iteration. By transforming our decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and controlling the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Functional Brain Connectivity Studies · Age of Information Optimization