Implicit variance regularization in non-contrastive SSL
Manu Srinath Halvagal, Axel Laborieux, Friedemann Zenke

TL;DR
This paper provides a theoretical analysis of how predictor networks in non-contrastive self-supervised learning methods like BYOL and SimSiam prevent collapse through implicit variance regularization, and introduces IsoLoss to improve learning dynamics.
Contribution
It offers a theoretical understanding of variance regularization in non-contrastive SSL and proposes IsoLoss to enhance convergence and robustness.
Findings
Both Euclidean and cosine similarity avoid collapse via implicit variance regularization.
Eigenvalues act as learning rate multipliers, influencing convergence.
IsoLoss accelerates initial learning and improves robustness.
Abstract
Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
