Gradient Masked Federated Optimization

Irene Tenison; Sreya Francis; Irina Rish

arXiv:2104.10322·cs.LG·April 22, 2021

Gradient Masked Federated Optimization

Irene Tenison, Sreya Francis, Irina Rish

PDF

Open Access

TL;DR

This paper introduces a modified federated learning algorithm that uses masked gradients to improve out-of-distribution generalization across clients with diverse data, addressing limitations of FedAVG.

Contribution

It proposes a novel FedAVG modification with masked gradients to enhance model robustness and generalization in federated settings.

Findings

01

Achieves better out-of-distribution accuracy than FedAVG.

02

Improves model robustness across diverse client data distributions.

03

Addresses poor generalization caused by sewed optima in FedAVG.

Abstract

Federated Averaging (FedAVG) has become the most popular federated learning algorithm due to its simplicity and low communication overhead. We use simple examples to show that FedAVG has the tendency to sew together the optima across the participating clients. These sewed optima exhibit poor generalization when used on a new client with new data distribution. Inspired by the invariance principles in (Arjovsky et al., 2019; Parascandolo et al., 2020), we focus on learning a model that is locally optimal across the different clients simultaneously. We propose a modification to FedAVG algorithm to include masked gradients (AND-mask from (Parascandolo et al., 2020)) across the clients and uses them to carry out an additional server model update. We show that this algorithm achieves better accuracy (out-of-distribution) than FedAVG, especially when the data is non-identically distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning