Understanding the Role of Layer Normalization in Label-Skewed Federated Learning
Guojun Zhang, Mahdi Beitollahi, Alex Bie, Xi Chen

TL;DR
This paper investigates how layer normalization improves federated learning under label shift by controlling feature collapse and overfitting, leading to faster convergence especially with skewed data distributions.
Contribution
It reveals the connection between layer normalization and label shift in federated learning, identifying feature normalization as a key mechanism that enhances training stability and convergence.
Findings
Normalization drastically improves performance under extreme label shift.
Feature normalization within layer normalization controls feature collapse and overfitting.
Layer normalization remains robust to learning rate choices in federated settings.
Abstract
Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning. To understand layer normalization better in FL, we identify the key contributing mechanism of normalization methods in FL, called feature normalization (FN), which applies normalization to the latent feature representation before the classifier head. Although LN and FN do not improve expressive power, they control feature collapse and local overfitting to heavily skewed datasets, and thus accelerates global training. Empirically, we show that normalization leads to drastic improvements on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsLayer Normalization
