FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix, Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane, Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva,, Maria Tele\'nczuk, Shadi Albarqouni, Salman Avestimehr

TL;DR
FLamby introduces a comprehensive suite of healthcare datasets and benchmarks for cross-silo federated learning, facilitating realistic research and development in privacy-sensitive medical applications.
Contribution
The paper provides the first realistic cross-silo healthcare datasets with natural splits, along with baseline benchmarks, to advance federated learning research in healthcare.
Findings
Benchmarking standard FL algorithms on healthcare datasets.
Demonstrating the diversity of tasks and modalities in FLamby.
Providing a modular toolkit for reproducible research.
Abstract
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few (--) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
