D$^2$: Decentralized Training over Decentralized Data

Hanlin Tang; Xiangru Lian; Ming Yan; Ce Zhang; Ji Liu

arXiv:1803.07068·cs.DC·April 23, 2018·185 cites

D$^2$: Decentralized Training over Decentralized Data

Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu

PDF

Open Access

TL;DR

This paper introduces D$^2$, a decentralized stochastic gradient descent algorithm that effectively handles large data variance across workers, improving convergence and robustness in decentralized machine learning settings.

Contribution

D$^2$ extends D-PSGD with variance reduction, making decentralized training more robust to data heterogeneity among workers.

Findings

01

D$^2$ outperforms D-PSGD in image classification tasks.

02

D$^2$ achieves faster convergence rates under high data variance.

03

Empirical results demonstrate robustness of D$^2$ to data heterogeneity.

Abstract

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D $^{2}$ , a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D $^{2}$ is a variance blackuction extension of the standard D-PSGD algorithm, which improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs