Improving Generalization in Federated Learning by Seeking Flat Minima

Debora Caldarola; Barbara Caputo; Marco Ciccone

arXiv:2203.11834·cs.LG·July 22, 2022

Improving Generalization in Federated Learning by Seeking Flat Minima

Debora Caldarola, Barbara Caputo, Marco Ciccone

PDF

1 Repo

TL;DR

This paper proposes using Sharpness-Aware Minimization and stochastic weight averaging in federated learning to find flatter minima, thereby improving model generalization across heterogeneous data scenarios.

Contribution

It introduces a novel approach combining SAM/ASAM and SWA in federated learning to enhance generalization by seeking flat minima, addressing heterogeneity challenges.

Findings

01

Significant improvement in generalization with SAM/ASAM and SWA.

02

Effective across diverse vision datasets and tasks.

03

Bridges the gap between federated and centralized models.

Abstract

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

debcaldarola/fedsam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Weight Averaging · Sharpness-Aware Minimization