Surviving the Edge: Federated Learning under Networking and Resource Constraints
Mike Mwanje, Okemawo Obadofin, Theophilus Benson, Joao Barros

TL;DR
This paper systematically characterizes transport-layer failures in federated learning systems under network and resource constraints, revealing critical thresholds where standard TCP management fails and proposing adjustments for reliable edge deployment.
Contribution
It is the first to empirically analyze transport-layer issues in FL under constrained conditions and demonstrates how TCP parameter tuning can improve robustness.
Findings
FL fails at 5s latency due to TCP timeouts
Packet loss over 50% causes training failure
Adjusting TCP parameters reduces training time in high-latency scenarios
Abstract
Motivated by the growing proliferation of federated learning (FL) in edge environments, we present the first systematic characterization of transport-layer breaking points in FL systems operating under conditions of highly constrained network and compute resources. Using a reproducible testbed with chaos engineering tools, we evaluate Flower under progressively degraded network conditions representative of resource-constrained deployments in Africa and similar environments. Our empirical investigation reveals a fundamental mismatch between FL's burst-idle communication pattern and standard TCP connection management. We identify precise operational boundaries: FL training catastrophically fails at 5-second one-way latency due to TCP handshake timeouts, above 50% packet loss due to buffer exhaustion, and with 90% client dropout rates. Through systematic analysis of connection patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
