BYOL works even without batch statistics
Pierre H. Richemond, Jean-Bastien Grill, Florent Altch\'e, Corentin, Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu,, Bilal Piot, Michal Valko

TL;DR
This paper demonstrates that Batch Normalization is not essential for BYOL's success, showing that alternative normalization methods can achieve comparable performance in self-supervised image representation learning.
Contribution
It provides experimental evidence that batch-independent normalization schemes can replace batch normalization in BYOL without loss of performance.
Findings
Replacing BN with group normalization and weight standardization maintains performance.
Batch statistics are not critical for preventing collapse in BYOL.
BYOL achieves similar accuracy without relying on batch-dependent normalization.
Abstract
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsBatch Normalization · Group Normalization
