A Probabilistic Model Behind Self-Supervised Learning
Alice Bizeul, Bernhard Sch\"olkopf, Carl Allen

TL;DR
This paper introduces a probabilistic generative model for self-supervised learning, unifying various methods and demonstrating improved representation quality, especially in style-dependent tasks.
Contribution
It proposes a generative latent variable model that unifies different SSL approaches and introduces SimVAE, a generative method that enhances representation learning.
Findings
SimVAE outperforms existing SSL methods on simple benchmarks.
The model provides a theoretical framework linking SSL to mutual information.
SimVAE narrows the gap between generative and discriminative methods.
Abstract
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. A common task is to classify augmentations or different modalities of the data, which share semantic content (e.g. an object in an image) but differ in style (e.g. the object's location). Many approaches to self-supervised learning have been proposed, e.g. SimCLR, CLIP, and DINO, which have recently gained much attention for their representations achieving downstream performance comparable to supervised learning. However, a theoretical understanding of self-supervised methods eludes. Addressing this, we present a generative latent variable model for self-supervised learning and show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations, providing a unifying theoretical framework for these methods. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Layer Normalization · Vision Transformer · self-DIstillation with NO labels · Average Pooling · Dense Connections
