Learnings from Federated Learning in the Real world
Christophe Dupuy, Tanya G. Roosta, Leo Long, Clement Chung, Rahul, Gupta, Salman Avestimehr

TL;DR
This paper investigates the challenges of applying federated learning to real-world data, focusing on data heterogeneity and device distribution, and demonstrates that non-uniform device sampling improves model performance in natural language understanding tasks.
Contribution
It introduces non-uniform device selection strategies in federated learning and shows their effectiveness in improving NLU model performance on real-world data.
Findings
Non-uniform device sampling boosts FL model performance.
Sampling based on device data volume accelerates convergence.
Non-uniform sampling outperforms uniform methods in continual FL.
Abstract
Federated Learning (FL) applied to real world data may suffer from several idiosyncrasies. One such idiosyncrasy is the data distribution across devices. Data across devices could be distributed such that there are some "heavy devices" with large amounts of data while there are many "light users" with only a handful of data points. There also exists heterogeneity of data across devices. In this study, we evaluate the impact of such idiosyncrasies on Natural Language Understanding (NLU) models trained using FL. We conduct experiments on data obtained from a large scale NLU system serving thousands of devices and show that simple non-uniform device selection based on the number of interactions at each round of FL training boosts the performance of the model. This benefit is further amplified in continual FL on consecutive time periods, where non-uniform sampling manages to swiftly catch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Human Mobility and Location-Based Analysis
