Improved Convergence for Decentralized Stochastic Optimization with Biased Gradients

Qing Xu; Yiwei Liao; Wenqi Fan; Xingxing You; Songyi Dian

arXiv:2604.08236·math.OC·April 10, 2026

Improved Convergence for Decentralized Stochastic Optimization with Biased Gradients

Qing Xu, Yiwei Liao, Wenqi Fan, Xingxing You, Songyi Dian

PDF

TL;DR

This paper introduces Biased-DMT, a decentralized optimization algorithm that effectively handles biased gradient estimators, ensuring reliable convergence even with communication compression and data heterogeneity.

Contribution

The paper proposes Biased-DMT, a novel decentralized algorithm with a comprehensive convergence theory that decouples network effects from data heterogeneity, handling biased gradients effectively.

Findings

01

Biased-DMT achieves linear speedup with the number of agents.

02

It converges to the exact error floor under absolute bias.

03

Numerical experiments confirm theoretical robustness and effectiveness.

Abstract

Decentralized stochastic optimization has emerged as a fundamental paradigm for large-scale machine learning. However, practical implementations often rely on biased gradient estimators arising from communication compression or inexact local oracles, which severely degrade convergence in the presence of data heterogeneity. To address the challenge, we propose Decentralized Momentum Tracking with Biased Gradients (Biased-DMT), a novel decentralized algorithm designed to operate reliably under biased gradient information. We establish a comprehensive convergence theory for Biased-DMT in nonconvex settings and show that it achieves linear speedup with respect to the number of agents. The theoretical analysis shows that Biased-DMT decouples the effects of network topology from data heterogeneity, enabling robust performance even in sparse communication networks. Notably, when the gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.