Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions
Daniel M. Jimenez G., David Solans, Mikko Heikkila, Andrea Vitaletti,, Nicolas Kourtellis, Aris Anagnostopoulos, Ioannis Chatzigiannakis

TL;DR
This survey comprehensively reviews the challenges non-IID data poses to federated learning, including taxonomy, metrics, solutions, and future research directions, highlighting the need for standardized frameworks and better understanding.
Contribution
It provides a detailed taxonomy, metrics, and frameworks for addressing non-IID data in federated learning, filling a gap in current research and guiding future studies.
Findings
Taxonomy for non-IID data and partition protocols
Metrics for quantifying data heterogeneity
Overview of solutions and frameworks for non-IID FL
Abstract
Recent advances in machine learning have highlighted Federated Learning (FL) as a promising approach that enables multiple distributed users (so-called clients) to collectively train ML models without sharing their private data. While this privacy-preserving method shows potential, it struggles when data across clients is not independent and identically distributed (non-IID) data. The latter remains an unsolved challenge that can result in poorer model performance and slower training times. Despite the significance of non-IID data in FL, there is a lack of consensus among researchers about its classification and quantification. This technical survey aims to fill that gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics to quantify data heterogeneity. Additionally, we describe popular solutions to address non-IID data and standardized frameworks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
