When hard negative sampling meets supervised contrastive learning
Zijun Long, George Killick, Richard McCreadie, Gerardo Aragon, Camarasa, Zaiqiao Meng

TL;DR
This paper introduces SCHaNe, a supervised contrastive learning method that incorporates hard negative sampling during fine-tuning, leading to significant accuracy improvements on image classification benchmarks without extra resources.
Contribution
The paper proposes SCHaNe, a novel supervised contrastive loss that weights negative samples based on dissimilarity, improving model performance without additional architecture or data.
Findings
SCHaNe outperforms BEiT-3 in Top-1 accuracy across benchmarks.
Achieves up to 3.32% improvement in few-shot learning.
Sets new state-of-the-art with 86.14% accuracy on ImageNet-1k.
Abstract
State-of-the-art image models predominantly follow a two-stage strategy: pre-training on large datasets and fine-tuning with cross-entropy loss. Many studies have shown that using cross-entropy can result in sub-optimal generalisation and stability. While the supervised contrastive loss addresses some limitations of cross-entropy loss by focusing on intra-class similarities and inter-class differences, it neglects the importance of hard negative mining. We propose that models will benefit from performance improvement by weighting negative samples based on their dissimilarity to positive counterparts. In this paper, we introduce a new supervised contrastive learning objective, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. Without requiring specialized architectures, additional data, or extra computational resources, experimental results indicate that…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The assumption that this paper aims to validate is both simple and easy to understand. Furthermore, the proposed objective is straightforward and intuitive. 2. There is a clear improvement in performance when compared with the conventional cross-entropy loss. 3. The paper is generally well-organized and presents its content logically.
1. One of the main weaknesses I've identified is that the primary baseline used in this paper is Cross-Entropy (CE) loss, not Supervised Contrastive Learning (SupCon). If the paper's central claim is that 'introducing importance weights for negative samples based on their dissimilarity plays an important role,' then I believe SupCon should be the main baseline for comparison. Although SCHaNe outperforms SupCon in the few-shot setting as shown in Table 3, the inclusion of SupCon results in other
1. This paper proposes a novel supervised contrastive learning objective function, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. 2. The proposed method achieves state-of-the-art performance on ImageNet-1k and outperforms the strong baseline BEiT-3 in Top-1 accuracy across twelve benchmarks, with significant gains in few-shot learning settings and full-dataset fine-tuning.
Strengths* 1. This paper proposes a novel supervised contrastive learning objective function, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. 2. The proposed method achieves state-of-the-art performance on ImageNet-1k and outperforms the strong baseline BEiT-3 in Top-1 accuracy across twelve benchmarks, with significant gains in few-shot learning settings and full-dataset fine-tuning. Weaknesses* 1. The paper could benefit from a more detailed comparison with
1. The method has been validated on multiple datasets, and comprehensive experiments have been conducted on downstream datasets. 2. The work appears to be relatively comprehensive, with a clear motivation, detailed method description, and important parameter ablation experiments. Overall, it seems well-executed and promising.
1. To enhance the credibility of our research, you should consider using the same base models (such as ViT or Swin) as other studies for our baseline in Table 1 and Table 2. 2. In order to provide a more comprehensive analysis, the results in Table 1 and Table 2 should include the performance of contrastive learning without the hard negative mining method. 3. The representation of Formula 3 needs to be clarified to ensure better understanding, as it is currently not very clear. 4. To provide
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Advanced Neural Network Applications
MethodsContrastive Learning · Supervised Contrastive Loss · Balanced Selection
