FedImpro: Measuring and Improving Client Update in Federated Learning
Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang, Liu, Bo Han, Xiaowen Chu

TL;DR
FedImpro introduces a novel method for mitigating client drift in federated learning by reconstructing feature distributions and decoupling model components, leading to improved generalization across heterogeneous data sources.
Contribution
This paper presents FedImpro, a new approach that constructs similar conditional distributions to reduce client dissimilarity and improve federated learning performance.
Findings
FedImpro reduces gradient dissimilarity in FL.
It enhances model generalization under data heterogeneity.
Experimental results confirm improved performance with FedImpro.
Abstract
Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This…
Peer Reviews
Decision·ICLR 2024 poster
- The paper addresses a very relevant issue for the FL community, i.e. limiting the negative effects of the client drift in heterogeneous settings. - The paper is well written and easy to follow - Very detailed discussion of related works - Theoretical claims supported by proofs - Extensive empirical analysis. FedImpro is compared with some state-of-the-art approaches in terms of final performance, convergence speed, weight divergence. Interesting ablation study on the depth of gradient decoupli
- My main concern regards the feasibility of deploying FedImpro in real-world contexts. FedImpro notably increases both the number of communications between clients and server, and the message size. The paper points out how the global distribution can be estimated using methods which impact the communication network less, but that does not eliminate the need for additional communication. - Some relevant related works are not discussed: ETF [1], SphereFed [2], FedSpeed [3]. - FedImpro is compar
1. This paper has a good level of writing and it is easy to follow. The idea is easy to follow and understand. 2. This paper combine split training with feature sharing to improve the generalization of the model.
1. I notice that the author ignore a very related and state-of-art baesline FedDyn [1], could the author conduct comparion experiments with FedDyn? 2. The timecomsuming for training the model increases for FedImpro. Could the author list the cpu-time cost comparion experiments to reach the target accuracy? [1] Acar, Durmus Alp Emre, et al. "Federated learning based on dynamic regularization." *arXiv preprint arXiv:2111.04263* (2021).
1. The idea of generalization contribution in FL sounds novel. 2. Experimental performances of FedImpro look superior.
The idea of having a lower-level and a higher-level neural network in FL is not new, i.e. the feature extraction network idea. I don't see many comparisons to these previous work in the experimental section.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cloud Data Security Solutions · Access Control and Trust
