Performance Improvement of Federated Learning Server using Smart NIC
Naoki Shibahara, Michihiro Koibuchi, Hiroki Matsutani

TL;DR
This paper enhances federated learning server performance by offloading aggregation tasks to a smart NIC (NVIDIA BlueField-2 DPU) with parallel processing, achieving faster execution with minimal accuracy loss.
Contribution
It introduces a DPDK-based offloading method on a smart NIC for federated learning servers, reducing processing time and improving scalability.
Findings
Execution time improved by 1.39 times
Negligible accuracy loss observed
Parallel processing on DPU accelerates aggregation
Abstract
Federated learning is a distributed machine learning approach where local weight parameters trained by clients locally are aggregated as global parameters by a server. The global parameters can be trained without uploading privacy-sensitive raw data owned by clients to the server. The aggregation on the server is simply done by averaging the local weight parameters, so it is an I/O intensive task where a network processing accounts for a large portion compared to the computation. The network processing workload further increases as the number of clients increases. To mitigate the network processing workload, in this paper, the federated learning server is offloaded to NVIDIA BlueField-2 DPU which is a smart NIC (Network Interface Card) that has eight processing cores. Dedicated processing cores are assigned by DPDK (Data Plane Development Kit) for receiving the local weight parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Brain Tumor Detection and Classification
