A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum Acceleration
Yuchen Hu, Xi Chen, Weidong Liu, Xiaojun Mao

TL;DR
This paper introduces EDM, a momentum-accelerated distributed stochastic gradient algorithm that effectively addresses data heterogeneity and enhances convergence in large-scale distributed optimization tasks.
Contribution
It proposes a novel EDM algorithm that incorporates momentum to mitigate bias from data heterogeneity and provides rigorous convergence analysis under non-convex and Polyak-Lojasiewicz conditions.
Findings
Converges sub-linearly to the neighborhood of the optimal solution regardless of data heterogeneity.
Achieves linear convergence under the Polyak-Lojasiewicz condition.
Provides tight convergence bounds for momentum-based distributed algorithms.
Abstract
Distributed stochastic optimization algorithms can simultaneously process large-scale datasets, significantly accelerating model training. However, their effectiveness is often hindered by the sparsity of distributed networks and data heterogeneity. In this paper, we propose a momentum-accelerated distributed stochastic gradient algorithm, termed Exact-Diffusion with Momentum (EDM), which mitigates the bias from data heterogeneity and incorporates momentum techniques commonly used in deep learning to enhance convergence rate. Our theoretical analysis demonstrates that the EDM algorithm converges sub-linearly to the neighborhood of the optimal solution, the radius of which is irrespective of data heterogeneity, when applied to non-convex objective functions; under the Polyak-Lojasiewicz condition, which is a weaker assumption than strong convexity, it converges linearly to the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research
