Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning
Fabian Stricker, David Bermbach, Christian Zirpins

TL;DR
This paper investigates how participant failures affect model quality in cross-silo federated learning, highlighting the influence of failure timing, data skew, and evaluation biases in collaborative organizational settings.
Contribution
It provides an extensive analysis of failure impacts in cross-silo FL, focusing on factors like timing, data skew, and evaluation biases, which are less studied compared to cross-device FL.
Findings
High data skew leads to overly optimistic evaluation results.
Timing of participant failures significantly affects model quality.
Evaluation methods can mask the true impact of failures.
Abstract
Federated learning (FL) is a new paradigm for training machine learning (ML) models without sharing data. While applying FL in cross-silo scenarios, where organizations collaborate, it is necessary that the FL system is reliable; however, participants can fail due to various reasons (e.g., communication issues or misconfigurations). In order to provide a reliable system, it is necessary to analyze the impact of participant failures. While this problem received attention in cross-device FL where mobile devices with limited resources participate, there is comparatively little research in cross-silo FL. Therefore, we conduct an extensive study for analyzing the impact of participant failures on the model quality in the context of inter-organizational cross-silo FL with few participants. In our study, we focus on analyzing generally influential factors such as the impact of the timing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Software System Performance and Reliability
