Understanding Communication Backends in Cross-Silo Federated Learning
Amir Ziashahabi, Chaoyang He, Salman Avestimehr

TL;DR
This paper benchmarks various communication backends in cross-silo federated learning, introduces a hybrid gRPC+S3 backend, and offers practical insights for optimizing FL system performance.
Contribution
It provides in-depth benchmarks of existing communication backends and introduces a novel hybrid backend to improve large model transmission in geo-distributed FL.
Findings
gRPC+S3 achieves up to 3.8x speedup over gRPC.
Benchmarks cover point-to-point and end-to-end performance.
Insights assist in selecting suitable communication backends for FL.
Abstract
Federated learning (FL) has emerged as a practical means for privacy-preserving distributed machine learning. FL's versatile design makes it suitable for various training settings, from IoT edge devices in cross-device FL to powerful servers in cross-silo FL. A key consequence of this versatility is the high level of diversity found in the networking configuration of FL applications. Coupled with the rising demand for large-scale models such as large language models, well-informed selection and configuration of communication backends become crucial for ensuring optimal performance in FL systems. This work focuses on cross-silo federated learning, presenting in-depth benchmarks of various communication backends, including MPI, gRPC, and PyTorch RPC. In addition, we introduce gRPC+S3, a hybrid backend designed to overcome the limitations of existing approaches, particularly for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
