Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits
Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, and Salman Avestimehr

TL;DR
This paper introduces a partial model averaging approach in federated learning that reduces model discrepancy among workers, leading to faster convergence and higher validation accuracy compared to traditional full averaging.
Contribution
It proposes a novel partial averaging framework that maintains model similarity across workers, improving global loss minimization in federated learning.
Findings
Partial averaging achieves up to 2.2% higher validation accuracy.
Reduces model discrepancy among workers.
Enhances convergence speed in federated learning.
Abstract
Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
MethodsLocal SGD · Stochastic Gradient Descent
