When hard negative sampling meets supervised contrastive learning

Zijun Long; George Killick; Richard McCreadie; Gerardo Aragon; Camarasa; Zaiqiao Meng

arXiv:2308.14893·cs.CV·August 30, 2023·1 cites

When hard negative sampling meets supervised contrastive learning

Zijun Long, George Killick, Richard McCreadie, Gerardo Aragon, Camarasa, Zaiqiao Meng

PDF

Open Access 3 Reviews

TL;DR

This paper introduces SCHaNe, a supervised contrastive learning method that incorporates hard negative sampling during fine-tuning, leading to significant accuracy improvements on image classification benchmarks without extra resources.

Contribution

The paper proposes SCHaNe, a novel supervised contrastive loss that weights negative samples based on dissimilarity, improving model performance without additional architecture or data.

Findings

01

SCHaNe outperforms BEiT-3 in Top-1 accuracy across benchmarks.

02

Achieves up to 3.32% improvement in few-shot learning.

03

Sets new state-of-the-art with 86.14% accuracy on ImageNet-1k.

Abstract

State-of-the-art image models predominantly follow a two-stage strategy: pre-training on large datasets and fine-tuning with cross-entropy loss. Many studies have shown that using cross-entropy can result in sub-optimal generalisation and stability. While the supervised contrastive loss addresses some limitations of cross-entropy loss by focusing on intra-class similarities and inter-class differences, it neglects the importance of hard negative mining. We propose that models will benefit from performance improvement by weighting negative samples based on their dissimilarity to positive counterparts. In this paper, we introduce a new supervised contrastive learning objective, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. Without requiring specialized architectures, additional data, or extra computational resources, experimental results indicate that…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The assumption that this paper aims to validate is both simple and easy to understand. Furthermore, the proposed objective is straightforward and intuitive. 2. There is a clear improvement in performance when compared with the conventional cross-entropy loss. 3. The paper is generally well-organized and presents its content logically.

Weaknesses

1. One of the main weaknesses I've identified is that the primary baseline used in this paper is Cross-Entropy (CE) loss, not Supervised Contrastive Learning (SupCon). If the paper's central claim is that 'introducing importance weights for negative samples based on their dissimilarity plays an important role,' then I believe SupCon should be the main baseline for comparison. Although SCHaNe outperforms SupCon in the few-shot setting as shown in Table 3, the inclusion of SupCon results in other

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This paper proposes a novel supervised contrastive learning objective function, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. 2. The proposed method achieves state-of-the-art performance on ImageNet-1k and outperforms the strong baseline BEiT-3 in Top-1 accuracy across twelve benchmarks, with significant gains in few-shot learning settings and full-dataset fine-tuning.

Weaknesses

Strengths* 1. This paper proposes a novel supervised contrastive learning objective function, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. 2. The proposed method achieves state-of-the-art performance on ImageNet-1k and outperforms the strong baseline BEiT-3 in Top-1 accuracy across twelve benchmarks, with significant gains in few-shot learning settings and full-dataset fine-tuning. Weaknesses* 1. The paper could benefit from a more detailed comparison with

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

1. The method has been validated on multiple datasets, and comprehensive experiments have been conducted on downstream datasets. 2. The work appears to be relatively comprehensive, with a clear motivation, detailed method description, and important parameter ablation experiments. Overall, it seems well-executed and promising.

Weaknesses

1. To enhance the credibility of our research, you should consider using the same base models (such as ViT or Swin) as other studies for our baseline in Table 1 and Table 2. 2. In order to provide a more comprehensive analysis, the results in Table 1 and Table 2 should include the performance of contrastive learning without the hard negative mining method. 3. The representation of Formula 3 needs to be clarified to ensure better understanding, as it is currently not very clear. 4. To provide

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Advanced Neural Network Applications

MethodsContrastive Learning · Supervised Contrastive Loss · Balanced Selection