Contrastive Learning as Kernel Approximation
Konstantinos Christopher Tsiolis

TL;DR
This paper explores how contrastive learning methods implicitly approximate positive semidefinite kernels, providing a theoretical understanding of their role in feature learning from unlabelled data.
Contribution
It offers a theoretical analysis linking contrastive loss minimizers to kernel approximation, enhancing understanding of contrastive learning's foundations.
Findings
Contrastive loss minimizers implicitly approximate PSD kernels.
Kernel approximation relates to feature representations in contrastive learning.
Provides a theoretical framework connecting contrastive learning and kernel methods.
Abstract
In standard supervised machine learning, it is necessary to provide a label for every input in the data. While raw data in many application domains is easily obtainable on the Internet, manual labelling of this data is prohibitively expensive. To circumvent this issue, contrastive learning methods produce low-dimensional vector representations (also called features) of high-dimensional inputs on large unlabelled datasets. This is done by training with a contrastive loss function, which enforces that similar inputs have high inner product and dissimilar inputs have low inner product in the feature space. Rather than annotating each input individually, it suffices to define a means of sampling pairs of similar and dissimilar inputs. Contrastive features can then be fed as inputs to supervised learning systems on much smaller labelled datasets to obtain high accuracy on end tasks of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM
MethodsContrastive Learning
