Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
Daniel D. Johnson, Ayoub El Hanchi, Chris J. Maddison

TL;DR
This paper demonstrates that contrastive learning methods can be viewed as kernel function approximators and shows that combining these kernels with PCA can optimally find a basis for view-invariant functions, with theoretical guarantees and empirical validation.
Contribution
It provides a theoretical framework linking contrastive learning to kernel methods and PCA, establishing optimality of the learned basis for view-invariant functions.
Findings
Contrastive learning methods approximate a fixed positive-pair kernel.
Kernel PCA applied to contrastive models can recover eigenfunctions of the Markov chain.
Empirical results show Kernel PCA improves representation quality depending on parameters.
Abstract
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels. Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA. We give generalization bounds for downstream linear prediction using our Kernel PCA representation, and show empirically on a set of synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Speech Recognition and Synthesis
MethodsContrastive Learning · Principal Components Analysis
