Understanding Self-Supervised Learning via Gaussian Mixture Models
Parikshit Bansal, Ali Kavis, Sujay Sanghavi

TL;DR
This paper provides a theoretical analysis of self-supervised learning methods, demonstrating their ability to identify optimal subspaces in Gaussian mixture models and filtering noise, supported by synthetic data experiments.
Contribution
It offers the first theoretical insights into why contrastive and non-contrastive self-supervised learning effectively find meaningful representations in Gaussian mixture models.
Findings
Contrastive learning finds optimal subspaces even with non-isotropic Gaussians.
Non-contrastive methods like SimSiam also achieve similar optimal subspace recovery.
Contrastive learning filters out noise, focusing on Fisher-optimal subspaces.
Abstract
Self-supervised learning attempts to learn representations from un-labeled data; it does so via a loss function that encourages the embedding of a point to be close to that of its augmentations. This simple idea performs remarkably well, yet it is not precisely theoretically understood why this is the case. In this paper we analyze self-supervised learning in a natural context: dimensionality reduction in Gaussian Mixture Models. Crucially, we define an augmentation of a data point as being another independent draw from the same underlying mixture component. We show that vanilla contrastive learning (specifically, the InfoNCE loss) is able to find the optimal lower-dimensional subspace even when the Gaussians are not isotropic -- something that vanilla spectral techniques cannot do. We also prove a similar result for "non-contrastive" self-supervised learning (i.e., SimSiam loss). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference
MethodsContrastive Learning · InfoNCE
