Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification
Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown

TL;DR
This paper investigates how non-identical data distributions across devices affect federated visual classification, proposing a synthesis method, performance measures, and a mitigation strategy that significantly improves accuracy in skewed data scenarios.
Contribution
It introduces a dataset synthesis approach for varying data similarity, performance metrics for federated averaging, and a server momentum mitigation technique for non-i.i.d. data.
Findings
Performance degrades with increasing data distribution differences.
Server momentum improves classification accuracy in skewed data settings.
Accuracy on CIFAR-10 increased from 30.1% to 76.9% under non-identical distributions.
Abstract
Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm. We show that performance degrades as distributions differ more, and propose a mitigation strategy via server momentum. Experiments on CIFAR-10 demonstrate improved classification performance over a range of non-identicalness, with classification accuracy improved from 30.1% to 76.9% in the most skewed settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Mobile Crowdsensing and Crowdsourcing
