Understanding Self-Supervised Learning via Latent Distribution Matching
Fabian A Mikulasch, Friedemann Zenke

TL;DR
This paper introduces a unifying theoretical framework called latent distribution matching (LDM) for understanding various self-supervised learning methods, providing insights and guiding principles for future development.
Contribution
The paper proposes LDM as a unifying theory for SSL, deriving a Bayesian filtering model, and proving identifiability of representations under mild conditions.
Findings
LDM unifies diverse SSL methods under a common framework.
A Bayesian filtering model for high-dimensional time series is derived.
Predictive LDM yields identifiable latent representations even with nonlinear predictors.
Abstract
Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
