Gradient Compression May Hurt Generalization: A Remedy by Synthetic Data Guided Sharpness Aware Minimization
Yujie Gu, Richeng Jin, Zhaoyang Zhang, Huaiyu Dai

TL;DR
This paper reveals that gradient compression can worsen generalization in federated learning by creating sharper loss landscapes, and proposes FedSynSAM, a method using synthetic data for better perturbation estimation to improve model robustness.
Contribution
The paper introduces FedSynSAM, a novel approach that uses synthetic data guided by global model trajectories to improve sharpness-aware minimization in federated learning.
Findings
Gradient compression leads to sharper loss landscapes in FL.
FedSynSAM improves generalization by better estimating global perturbations.
Experimental results validate the effectiveness of FedSynSAM.
Abstract
It is commonly believed that gradient compression in federated learning (FL) enjoys significant improvement in communication efficiency with negligible performance degradation. In this paper, we find that gradient compression induces sharper loss landscapes in federated learning, particularly under non-IID data distributions, which suggests hindered generalization capability. The recently emerging Sharpness Aware Minimization (SAM) effectively searches for a flat minima by incorporating a gradient ascent step (i.e., perturbing the model with gradients) before the celebrated stochastic gradient descent. Nonetheless, the direct application of SAM in FL suffers from inaccurate estimation of the global perturbation due to data heterogeneity. Existing approaches propose to utilize the model update from the previous communication round as a rough estimate. However, its effectiveness is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
