Gaussian Joint Embeddings For Self-Supervised Representation Learning
Yongchao Huang

TL;DR
This paper introduces Gaussian Joint Embeddings (GJE) and Gaussian Mixture Joint Embeddings (GMJE), probabilistic models for self-supervised learning that improve multi-modal representation and uncertainty estimation.
Contribution
It proposes a probabilistic framework for self-supervised learning that models joint densities, addressing collapse issues and enabling principled uncertainty and latent geometry control.
Findings
GMJE recovers complex conditional structures in multi-modal tasks.
GMJE learns competitive discriminative representations.
Latent densities from GMJE are better suited for unconditional sampling.
Abstract
Self-supervised representation learning often relies on deterministic predictive architectures to align context and target views in latent space. While effective in many settings, such methods are limited in genuinely multi-modal inverse problems, where squared-loss prediction collapses towards conditional averages, and they frequently depend on architectural asymmetries to prevent representation collapse. In this work, we propose a probabilistic alternative based on generative joint modeling. We introduce Gaussian Joint Embeddings (GJE) and its multi-modal extension, Gaussian Mixture Joint Embeddings (GMJE), which model the joint density of context and target representations and replace black-box prediction with closed-form conditional inference under an explicit probabilistic model. This yields principled uncertainty estimates and a covariance-aware objective for controlling latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
