Adaptive Multi-head Contrastive Learning

Lei Wang; Piotr Koniusz; Tom Gedeon; Liang Zheng

arXiv:2310.05615·cs.CV·September 24, 2024

Adaptive Multi-head Contrastive Learning

Lei Wang, Piotr Koniusz, Tom Gedeon, Liang Zheng

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Adaptive Multi-Head Contrastive Learning (AMCL) introduces multiple projection heads with adaptive temperatures to better handle diverse augmentations and intra/inter-sample similarities, improving contrastive learning performance.

Contribution

It proposes a novel multi-head contrastive framework with adaptive temperature re-weighting, enhancing existing contrastive methods like SimCLR, MoCo, and Barlow Twins.

Findings

01

Consistent performance improvement across various backbones.

02

Enhanced results with multiple augmentation strategies.

03

Effective in improving contrastive learning outcomes.

Abstract

In contrastive learning, two views of an original image, generated by different augmentations, are considered a positive pair, and their similarity is required to be high. Similarly, two views of distinct images form a negative pair, with encouraged low similarity. Typically, a single similarity measure, provided by a lone projection head, evaluates positive and negative sample pairs. However, due to diverse augmentation strategies and varying intra-sample similarity, views from the same image may not always be similar. Additionally, owing to inter-sample similarity, views from different images may be more akin than those from the same image. Consequently, enforcing high similarity for positive pairs and low similarity for negative pairs may be unattainable, and in some cases, such enforcement could detrimentally impact performance. To address this challenge, we propose using multiple…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

This paper proposes a plug-and-played approach consisting of a multi-head projection strategy and adaptive temperature scaling regularization. The approach could be adopted into most contrastive learning methods, consistently improving the performance.

Weaknesses

1. My major concern is the unclear mechanisms of the multi-head projection strategy and adaptive temperature scaling regularization. Although the effectiveness of the two modules has been empirically verified by the performance improvement against baseline, it is still not clear enough why they work. In other words, some discussion on the mechanisms of the two modules could be further clarified and some qualitative analysis or examples of how the multiple projection heads and adaptive temperatur

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

+ This paper proposes a new problem, how to refine the similarity metric between positive- and negative- pairs. The plain contrastive learning methods address it by the fitting of global loss in the training dataset. This paper tries takeling it by multiple projection heads.

Weaknesses

+ Empirically, multiple projection heads may not introduce more diversity of image representations, since a wider MLP projection head can achieve the similar target. A wider MLP head also provides larger capacity for image representations, whereas the results in SimCLR paper reports that wider MLP projection doesn't achieve more performance gain than the narrow one. So, as the wider MLP projection head, I don't think multiple projection heads can address the proposed problem. + The relationship

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

The proposed multi-head strategy could be applied to various contrastive and non-contrastive learning frameworks. Experiments on four datasets demonstrate the effectiveness of the proposed method.

Weaknesses

1. I do not quite get the novelty of this work. It seems that all this work has made is replicating the projection heads. The authors try to find a motivation from the similarity between positive and negative pairs to support such a modification. However, after reading the paper, I do not feel the observations in Fig. 1 correlate much with the proposed method. 2. As for the paper writing, probably due to the limited contributions in the method design, the authors spend a lot of space reviewing p

Code & Models

Repositories

leiwangr/cl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Fetal and Pediatric Neurological Disorders · Seismic Imaging and Inversion Techniques

MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Residual Connection · Convolution · Dense Connections · Max Pooling · Global Average Pooling