An Augmentation Overlap Theory of Contrastive Learning

Qi Zhang; Yifei Wang; Yisen Wang

arXiv:2511.03114·cs.LG·November 6, 2025

An Augmentation Overlap Theory of Contrastive Learning

Qi Zhang, Yifei Wang, Yisen Wang

PDF

Open Access

TL;DR

This paper introduces the augmentation overlap theory to explain contrastive learning's effectiveness, showing how aggressive data augmentations increase intra-class sample overlap, leading to better clustering and a new unsupervised evaluation metric.

Contribution

It proposes a new theoretical framework based on augmentation overlap, relaxing previous assumptions and deriving bounds for contrastive learning performance.

Findings

01

Supports the augmentation overlap theory with theoretical bounds.

02

Develops an unsupervised metric correlating with downstream performance.

03

Provides code for practical implementation.

Abstract

Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Face and Expression Recognition