A Mathematical Perspective On Contrastive Learning
Ricardo Baptista, Andrew M. Stuart, Son Tran

TL;DR
This paper offers a mathematical framework for multimodal contrastive learning, interpreting it as probabilistic encoder optimization, and introduces novel variants and generalizations with theoretical and experimental validation.
Contribution
It provides a probabilistic perspective on contrastive learning, leading to new loss functions, metrics, and variants applicable to generative and retrieval tasks.
Findings
Probabilistic framework unifies contrastive learning and generative models.
Novel contrastive variants improve mode-seeking and generative tasks.
Experimental results validate theoretical insights on Gaussian data, MNIST, and oceanography data.
Abstract
Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent space. In this work, we focus on the bimodal setting and interpret contrastive learning as the optimization of (parameterized) encoders that define conditional probability distributions, for each modality conditioned on the other, consistent with the available data. This provides a framework for multimodal algorithms such as crossmodal retrieval, which identifies the mode of one of these conditional distributions, and crossmodal classification, which is similar to retrieval but includes a fine-tuning step to make it task specific. The framework we adopt also gives rise to crossmodal generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsADaptive gradient method with the OPTimal convergence rate · Focus · Contrastive Learning · ALIGN · Sparse Evolutionary Training
