Unlocking the Potential of Federated Learning for Deeper Models
Haolin Wang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Jiaxing Shen

TL;DR
This paper identifies the challenges of applying federated learning to deeper neural networks due to divergence accumulation and proposes guidelines to mitigate this, significantly improving model performance.
Contribution
It uncovers divergence accumulation as a key issue in FL with deep models and offers practical strategies to enhance performance.
Findings
Divergence accumulation causes performance decline in deep FL models.
Theoretical and empirical evidence supports divergence effects.
Guidelines like wider models and reduced receptive fields improve deep FL accuracy.
Abstract
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients without compromising their privacy. Although FL has demonstrated remarkable success in various scenarios, recent studies mainly utilize shallow and small neural networks. In our research, we discover a significant performance decline when applying the existing FL framework to deeper neural networks, even when client data are independently and identically distributed (i.i.d.). Our further investigation shows that the decline is due to the continuous accumulation of dissimilarities among client models during the layer-by-layer back-propagation process, which we refer to as "divergence accumulation." As deeper models involve a longer chain of divergence accumulation, they tend to manifest greater divergence, subsequently leading to performance decline.…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- This paper investigates an interesting topic. - The introduction well presents the motivation. - The overall study design is easy to follow. - The authors try to provide both theoretical and empirical evaluations on the divergence accumulation.
- Observations in Fig 2(a) may not be accurate. divergence of deep layers also tends to increase and converge if the model is trained more than 40 rounds. - Given \epsilon_i with Z, the assumption 2 that assumes \epsilon_i is dependent with Z and H may not hold. - The theorem of divergence accumulation only proves linear layers. It has no consideration for other important layers in modern deep neural networks,e.g., convolutional layers and normalization layers. - The proposed theorem does not sh
The paper is well-written and easy to follow. The paper focuses specifically on the challenge of applying federated learning to deeper neural networks, which is an important problem.
The analysis of divergence accumulation is primarily based on a simplified linear layer with an activation function. However, since the authors conducted experiments using ResNet, it would be more appropriate for them to provide the analysis based on the residual module. Additionally, the process of deriving the entire formula lacks clarity and is challenging to comprehend.
- The authors give guidelines on how to improve models in federated learning. - They run various experiments and try to prove the divergence accumulation phenomenon.
Various claims and steps in this paper are flawed. For example, it is not surprising that increasing width increases performance. The problem is that increasing depth is decreasing performance in your case, which is something that need to be investigated in depth and demonstrated with careful experiments because it is against the strongest point of "deep" learning and its common wisdom. Another thing is that federated learning can be reduced to SGD in the simplest setting (one local step, iid c
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Traffic Prediction and Management Techniques
