Statistical Consistency and Generalization of Contrastive Representation Learning

Yuanfan Li; Xiyuan Wei; Tianbao Yang; Yiming Ying

arXiv:2605.02116·cs.LG·May 21, 2026

Statistical Consistency and Generalization of Contrastive Representation Learning

Yuanfan Li, Xiyuan Wei, Tianbao Yang, Yiming Ying

PDF

TL;DR

This paper develops a comprehensive statistical learning theory for contrastive representation learning, addressing its consistency, generalization bounds, and retrieval performance, supported by large-scale experiments.

Contribution

It provides the first unified theoretical framework for CRL, establishing statistical consistency, generalization bounds, and analyzing the impact of negative samples.

Findings

01

Contrastive loss is statistically consistent with optimal ranking.

02

Generalization bounds of order O(1/m + 1/√n) and O(1/√m + 1/√n) are derived.

03

Large negative sets empirically improve CRL performance, explained by theory.

Abstract

Contrastive representation learning (CRL) underpins many modern foundation models. Despite recent theoretical progress, existing analyses suffer from several key limitations: (i) the statistical consistency of CRL remains poorly understood; (ii) available generalization bounds deteriorate as the number of negative samples increases, contradicting the empirical benefits of large negative sets; and (iii) the retrieval performance of CRL has received limited theoretical attention. In this paper, we develop a unified statistical learning theory for CRL. For downstream tasks, we evaluate retrieval quality using an AUC-type population criterion and show that the contrastive loss is \emph{statistically consistent} with optimal ranking. We further establish a \emph{calibration-style inequality} that quantitatively relates excess contrastive risk to excess retrieval suboptimality. For upstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.