Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics
Han Bao

TL;DR
This paper demonstrates that feature normalization, when combined with cosine loss, stabilizes non-contrastive learning dynamics and prevents collapse, providing a theoretical foundation for more robust self-supervised learning methods.
Contribution
The authors extend existing theory by incorporating feature normalization and cosine loss, revealing their role in preventing collapse in non-contrastive learning.
Findings
Feature normalization induces sixth-order dynamics.
Stable equilibrium can emerge even from collapsed solutions.
Normalization is crucial for preventing collapse in practice.
Abstract
Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The technical analysis appears rigorous and reasonably clear to follow.
1. Considering the majority of the analysis assumes norms of all features are nearly same, due to the high dimensional limit, I do not see how this analysis can show the effects of feature normalization on non-contrastive learning. 2. Moreover, the notions of "6-th order dynamics"and "3rd order dynamics" are not sufficiently explained in the paper. 3. Most importantly, I'm not convinced this an interesting problem to study in the context of prior work providing key understanding regarding ho
This article presents an improvement other than the theoretical framework of non-contrastive learning using solely the Euclidian loss. The paper is well-written and easy to follow. The assumptions taken are relatively well justified and allow for an interesting analysis.
**Previous literature** There have been recent contributions to the literature of non-contrastive learning which do take into account the cosine loss, and which are not referenced in this article. In particular, Halvagal et al., Implicit variance regularization in non-contrastive SSL, 2023. The eigenmode dynamics seem extremely similar (after some changes in the notation) and it seems extremely important to me that the authors compare themselves to this article. The authors also do not seem to h
- The paper is well-written and easy to follow. - This work proves that the feature norm concentrates around a constant with proper parameter initialization.
1. Some of the assumptions are quite stringent, especially since this paper is not pioneering work, and they may not provide much reference value for practical non-contrastive learning with negative pairs. 2. Assumptions 2 and 3 in section 4 are rather strict. Assumption 2 requires that the input data follow an isotropic Gaussian distribution, which is hard to accept in practical situations. Perhaps a mixture of isotropic Gaussians could be considered. Assumption 3 pertains to the width-infinit
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
MethodsBootstrap Your Own Latent
