Mini-Batch Optimization of Contrastive Loss

Jaewoong Cho; Kartik Sreenivasan; Keon Lee; Kyunghoo Mun; Soheun Yi,; Jeong-Gwan Lee; Anna Lee; Jy-yong Sohn; Dimitris Papailiopoulos; Kangwook Lee

arXiv:2307.05906·cs.LG·July 13, 2023·1 cites

Mini-Batch Optimization of Contrastive Loss

Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi,, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

PDF

Open Access 1 Repo

TL;DR

This paper explores the theoretical underpinnings of mini-batch optimization in contrastive learning, showing conditions for equivalence to full-batch optimization and proposing methods to improve convergence and efficiency.

Contribution

It provides a theoretical analysis of mini-batch contrastive learning, introduces a spectral clustering approach to identify high-loss mini-batches, and demonstrates improved convergence over standard SGD.

Findings

01

Mini-batch optimization is equivalent to full-batch only when all possible mini-batches are used.

02

Using high-loss mini-batches accelerates SGD convergence.

03

The proposed spectral clustering method outperforms vanilla SGD in experiments.

Abstract

Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging to consider all possible positive and negative pairs, leading to the use of mini-batch optimization. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $(B N)$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

krafton-ai/mini-batch-cl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Remote-Sensing Image Classification

MethodsStochastic Gradient Descent · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings