Gaussian Joint Embeddings For Self-Supervised Representation Learning

Yongchao Huang

arXiv:2603.26799·cs.LG·March 31, 2026

Gaussian Joint Embeddings For Self-Supervised Representation Learning

Yongchao Huang

PDF

TL;DR

This paper introduces Gaussian Joint Embeddings (GJE) and Gaussian Mixture Joint Embeddings (GMJE), probabilistic models for self-supervised learning that improve multi-modal representation and uncertainty estimation.

Contribution

It proposes a probabilistic framework for self-supervised learning that models joint densities, addressing collapse issues and enabling principled uncertainty and latent geometry control.

Findings

01

GMJE recovers complex conditional structures in multi-modal tasks.

02

GMJE learns competitive discriminative representations.

03

Latent densities from GMJE are better suited for unconditional sampling.

Abstract

Self-supervised representation learning often relies on deterministic predictive architectures to align context and target views in latent space. While effective in many settings, such methods are limited in genuinely multi-modal inverse problems, where squared-loss prediction collapses towards conditional averages, and they frequently depend on architectural asymmetries to prevent representation collapse. In this work, we propose a probabilistic alternative based on generative joint modeling. We introduce Gaussian Joint Embeddings (GJE) and its multi-modal extension, Gaussian Mixture Joint Embeddings (GMJE), which model the joint density of context and target representations and replace black-box prediction with closed-form conditional inference under an explicit probabilistic model. This yields principled uncertainty estimates and a covariance-aware objective for controlling latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.