FedWon: Triumphing Multi-domain Federated Learning Without Normalization
Weiming Zhuang, Lingjuan Lyu

TL;DR
FedWon introduces a normalization-free federated learning method that effectively handles multi-domain data heterogeneity, outperforming existing approaches in accuracy and robustness across various datasets and models.
Contribution
The paper proposes FedWon, a novel normalization-free federated learning approach that addresses multi-domain data challenges by reparameterizing convolution layers, improving performance over state-of-the-art methods.
Findings
FedWon surpasses FedAvg and FedBN in accuracy across all tested datasets.
FedWon maintains strong performance even with small batch sizes.
FedWon effectively handles both multi-domain and skewed label distribution scenarios.
Abstract
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, instead of label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while existing…
Peer Reviews
Decision·ICLR 2024 poster
This paper studied the multi-domain FL problem and proposed a novel FL method FedWon which employs the Scaled Weight Standardization technique as an alternative to the batch-normalization module. 1. The FedWon can achieve competitive performance to the SOTA methods without additional computation cost during inference. 2. Experiments on multi-domain datasets show the FedWon overperforms the conventional FL methods even if the batch size of the training process is small (1 or 2), and the visual
1. The paper brought the Scaled Weight Standardization (SWS) technique to handle the multi-domain FL problem. However, there is less analysis about the SWS’s impacts to the FL process, e.g., will it lead to a better convergence bound? 2. Many methods are compared in the paper, some of them have BN, some of them are suitable for cross-silo FL only, and some of them are suitable for cross-device FL, it would be clearer to have a structured summarization to help understand the scenarios where thes
The proposed method is easy to implement, and can be potentially plug into many existing methods. The experiments are extensive and the paper is well-writing in general.
* I have many questions that I wish could be solved. Some of them are from questionable arguments from the paper, some of them are from the abnormal experimental results, and some of them are from the my curiosity in why the proposed method would work. Please see the Questions section for details. * The novelty is quite limited, where the proposed method is to use an existing reparametrization trick (Brock et al. (2021a)) in the FL setting.
- The paper is easy to read, and the comparison figure (Figure 2) effectively illustrate the differences with previous methods. - The ablation experiments for cross-silo federated learning cover various factors affecting model performance, such as batch size and client sampling rate.
- The proposed method lacks innovation; it essentially directly applies the weight standardization and gradient clipping from the NF-Net series [1, 2] to the federated learning setting. It does not offer targeted improvements to address the unique challenges of the federated learning setting. - The experiments for cross-device FL in the paper are not sufficient for the proposed method's effectiveness. The cross-device FL experiments only include a single dataset and 100 clients. [1] Brock, Andr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
MethodsConvolution · Batch Normalization
