Probabilistic Variational Contrastive Learning

Minoh Jeong; Seonho Kim; Alfred Hero

arXiv:2506.10159·cs.LG·October 8, 2025

Probabilistic Variational Contrastive Learning

Minoh Jeong, Seonho Kim, Alfred Hero

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Variational Contrastive Learning (VCL), a probabilistic framework that enhances contrastive learning by providing uncertainty quantification and mitigating issues like dimensional collapse, while maintaining high classification performance.

Contribution

VCL is a novel decoder-free probabilistic contrastive learning method that models embeddings as distributions, enabling uncertainty estimation and improved robustness.

Findings

01

VCL reduces dimensional collapse in embeddings.

02

VCL improves mutual information with class labels.

03

VCL matches or surpasses deterministic baselines in classification accuracy.

Abstract

Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_{θ} (z ∣ x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiation--VSimCLR and VSupCon--replace deterministic embeddings with samples from $q_{θ} (z ∣ x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

The proposed method is leads to easy-to-implement objective. The empirical results shows some improvement compared to conventional contrastive learning methodology, especially in mitigating collapse phenomena.

Weaknesses

1. The motivation of this paper is not compelling. Why is that the conventional contrastive learning employs a deterministic map to represent each sample a limitation? What is the meaning of the "uncertainty" is a representation when there is no "true" representative point in the latent space? How does the quantified uncertainty affect the downstream tasks? 2. The key approximation result (12) is not fully justified. It is based on the approximation (10), which in turn is based on Lemma 3.1. H

Reviewer 02Rating 2Confidence 4

Strengths

The paper provides a coherent theoretical reinterpretation of contrastive learning through a probabilistic lens. By formulating InfoNCE as a surrogate reconstruction term in an ELBO objective, it bridges a gap between variational inference and contrastive learning, offering a fresh theoretical grounding that could allow for further analytical work in this area. The introduction of a uniform spherical prior and projected normal posterior shows careful consideration of embedding geometry. This de

Weaknesses

1. The paper contains several flaws in mathematical derivation. While the experimental framework remains sound, these issues require correction to ensure the theoretical contributions align with the stated goals of providing a rigorous connection between InfoNCE and ELBO. - **Incorrect sign propagation in the InfoNCE-ELBO connection** The paper's central theoretical contribution, minimizing InfoNCE asymptotically maximizes the ELBO reconstruction term, is incorrectly derived. In the proof of

Reviewer 03Rating 2Confidence 4

Strengths

1. The topic is highly relevant and timely, as **uncertainty modeling** and **probabilistic embeddings** are increasingly important in self-supervised and contrastive learning research. 2. The proposed approach is conceptually simple, easy to implement, and broadly compatible with existing contrastive frameworks without requiring architectural changes. 3. The paper is **clearly written** and well organized, with helpful figures, examples, and ablation studies that aid understanding. 4. The

Weaknesses

## Major 1. **Issues in the theoretical derivations.** I believe several of the demonstrations contain conceptual or mathematical inaccuracies: - **Appendix B.1:** Since $z$ is a continuous variable, $H(r)$ represents a *differential entropy*, which can be negative. This invalidates the step in the proof that relies on $H(r) \ge 0$. - **Appendix C.1:** The assumption that $g$ is invertible seems unjustified. As stated in the paper, the representations are *compact versions* of t

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Adversarial Robustness in Machine Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Random Gaussian Blur · Normalized Temperature-scaled Cross Entropy Loss · Feedforward Network · SimCLR · InfoNCE · Contrastive Learning