On the Similarities of Embeddings in Contrastive Learning

Chungpa Lee; Sehee Lim; Kibok Lee; Jy-yong Sohn

arXiv:2506.09781·cs.LG·July 16, 2025

On the Similarities of Embeddings in Contrastive Learning

Chungpa Lee, Sehee Lim, Kibok Lee, Jy-yong Sohn

PDF

Open Access 1 Repo

TL;DR

This paper provides a unified cosine similarity framework for contrastive learning, revealing limitations in full-batch and mini-batch settings, and proposes an auxiliary loss to improve small-batch performance.

Contribution

It introduces a theoretical framework for contrastive learning and proposes a new auxiliary loss to enhance representation quality in mini-batch training.

Findings

01

Perfect alignment is impossible when negative similarities are below a threshold.

02

Smaller batch sizes cause higher variance in negative similarities, degrading performance.

03

The proposed auxiliary loss improves small-batch contrastive learning results.

Abstract

Contrastive learning operates on a simple yet effective principle: Embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. In this paper, we propose a unified framework for understanding contrastive learning through the lens of cosine similarity, and present two key theoretical insights derived from this framework. First, in full-batch settings, we show that perfect alignment of positive pairs is unattainable when negative-pair similarities fall below a threshold, and this misalignment can be mitigated by incorporating within-view negative pairs into the objective. Second, in mini-batch settings, smaller batch sizes induce stronger separation among negative pairs in the embedding space, i.e., higher variance in their similarities, which in turn degrades the quality of learned representations compared to full-batch settings. To address this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leechungpa/embedding-similarity-cl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Face and Expression Recognition