Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation
Omar Chehab, Alexandre Gramfort, Aapo Hyvarinen

TL;DR
This paper investigates the optimal noise distribution in self-supervised learning, revealing that it often differs from the data distribution and that the benefits of using the optimal noise are modest compared to simpler choices.
Contribution
It provides a theoretical and empirical analysis of noise distribution optimality in Noise-Contrastive Estimation, challenging the common assumption that it should match the data distribution.
Findings
Optimal noise differs from data distribution when the energy is unknown.
Using the data distribution as noise is nearly optimal in many cases.
Theoretical insights connect noise choice to estimator efficiency.
Abstract
Self-supervised learning is an increasingly popular approach to unsupervised learning, achieving state-of-the-art results. A prevalent approach consists in contrasting data points and noise points within a classification task: this requires a good noise distribution which is notoriously hard to specify. While a comprehensive theory is missing, it is widely assumed that the optimal noise distribution should in practice be made equal to the data distribution, as in Generative Adversarial Networks (GANs). We here empirically and theoretically challenge this assumption. We turn to Noise-Contrastive Estimation (NCE) which grounds this self-supervised task as an estimation problem of an energy-based model of the data. This ties the optimality of the noise distribution to the sample efficiency of the estimator, which is rigorously defined as its asymptotic variance, or mean-squared error. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
