The Non-IID Data Quagmire of Decentralized Machine Learning
Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

TL;DR
This paper investigates the challenges of decentralized machine learning with skewed data distributions across devices, revealing significant accuracy issues and proposing adaptive communication strategies and normalization techniques to mitigate these problems.
Contribution
It provides a comprehensive experimental analysis of data skew effects in decentralized DNN training and introduces SkewScout, an adaptive system to improve accuracy under skewed data conditions.
Findings
Data skew causes significant accuracy loss in decentralized learning.
Batch normalization models are particularly affected by data skew.
Adaptive communication frequency can mitigate accuracy degradation.
Abstract
Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Age of Information Optimization
MethodsGroup Normalization · Batch Normalization
