Unlocking the Potential of Federated Learning for Deeper Models

Haolin Wang; Xuefeng Liu; Jianwei Niu; Shaojie Tang; Jiaxing Shen

arXiv:2306.02701·cs.LG·June 6, 2023·2 cites

Unlocking the Potential of Federated Learning for Deeper Models

Haolin Wang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Jiaxing Shen

PDF

Open Access 3 Reviews

TL;DR

This paper identifies the challenges of applying federated learning to deeper neural networks due to divergence accumulation and proposes guidelines to mitigate this, significantly improving model performance.

Contribution

It uncovers divergence accumulation as a key issue in FL with deep models and offers practical strategies to enhance performance.

Findings

01

Divergence accumulation causes performance decline in deep FL models.

02

Theoretical and empirical evidence supports divergence effects.

03

Guidelines like wider models and reduced receptive fields improve deep FL accuracy.

Abstract

Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients without compromising their privacy. Although FL has demonstrated remarkable success in various scenarios, recent studies mainly utilize shallow and small neural networks. In our research, we discover a significant performance decline when applying the existing FL framework to deeper neural networks, even when client data are independently and identically distributed (i.i.d.). Our further investigation shows that the decline is due to the continuous accumulation of dissimilarities among client models during the layer-by-layer back-propagation process, which we refer to as "divergence accumulation." As deeper models involve a longer chain of divergence accumulation, they tend to manifest greater divergence, subsequently leading to performance decline.…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- This paper investigates an interesting topic. - The introduction well presents the motivation. - The overall study design is easy to follow. - The authors try to provide both theoretical and empirical evaluations on the divergence accumulation.

Weaknesses

- Observations in Fig 2(a) may not be accurate. divergence of deep layers also tends to increase and converge if the model is trained more than 40 rounds. - Given \epsilon_i with Z, the assumption 2 that assumes \epsilon_i is dependent with Z and H may not hold. - The theorem of divergence accumulation only proves linear layers. It has no consideration for other important layers in modern deep neural networks,e.g., convolutional layers and normalization layers. - The proposed theorem does not sh

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The paper is well-written and easy to follow. The paper focuses specifically on the challenge of applying federated learning to deeper neural networks, which is an important problem.

Weaknesses

The analysis of divergence accumulation is primarily based on a simplified linear layer with an activation function. However, since the authors conducted experiments using ResNet, it would be more appropriate for them to provide the analysis based on the residual module. Additionally, the process of deriving the entire formula lacks clarity and is challenging to comprehend.

Reviewer 03Rating 1· strong rejectConfidence 4

Strengths

- The authors give guidelines on how to improve models in federated learning. - They run various experiments and try to prove the divergence accumulation phenomenon.

Weaknesses

Various claims and steps in this paper are flawed. For example, it is not surprising that increasing width increases performance. The problem is that increasing depth is decreasing performance in your case, which is something that need to be investigated in depth and demonstrated with careful experiments because it is against the strongest point of "deep" learning and its common wisdom. Another thing is that federated learning can be reduced to SGD in the simplest setting (one local step, iid c

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Traffic Prediction and Management Techniques