Partial Model Averaging in Federated Learning: Performance Guarantees   and Benefits

Sunwoo Lee; Anit Kumar Sahu; Chaoyang He; and Salman Avestimehr

arXiv:2201.03789·cs.LG·January 12, 2022

Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits

Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, and Salman Avestimehr

PDF

Open Access

TL;DR

This paper introduces a partial model averaging approach in federated learning that reduces model discrepancy among workers, leading to faster convergence and higher validation accuracy compared to traditional full averaging.

Contribution

It proposes a novel partial averaging framework that maintains model similarity across workers, improving global loss minimization in federated learning.

Findings

01

Partial averaging achieves up to 2.2% higher validation accuracy.

02

Reduces model discrepancy among workers.

03

Enhances convergence speed in federated learning.

Abstract

Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization

MethodsLocal SGD · Stochastic Gradient Descent