Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
Leyan Pan, Xinyuan Cao

TL;DR
This paper investigates how batch normalization and weight decay influence Neural Collapse, revealing their critical roles in the geometric structure of deep neural network features at the end of training.
Contribution
It provides a theoretical lower bound on Neural Collapse emergence based on BN and WD, supported by experiments showing their impact on feature geometry.
Findings
BN enhances Neural Collapse presence
Optimal WD values promote collapse
Lower training loss correlates with stronger collapse
Abstract
Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Cell Image Analysis Techniques · Adversarial Robustness in Machine Learning
MethodsWeight Decay · Batch Normalization
