TL;DR
This paper investigates how the size of negative sample queues affects contrastive learning for sentence embedding, proposing MoCoSE with mechanisms to optimize the use of historical negative samples and demonstrating improved performance on semantic similarity tasks.
Contribution
It introduces MoCoSE, a momentum contrastive learning model with a novel maximum traceable distance metric to optimize negative sample utilization in sentence embedding.
Findings
Optimal negative sample queue range improves performance.
Maximum traceable distance correlates with better embeddings.
Achieved 77.27% Spearman's correlation on STS task.
Abstract
Contrastive learning is emerging as a powerful technique for extracting knowledge from unlabeled data. This technique requires a balanced mixture of two ingredients: positive (similar) and negative (dissimilar) samples. This is typically achieved by maintaining a queue of negative samples during training. Prior works in the area typically uses a fixed-length negative sample queue, but how the negative sample size affects the model performance remains unclear. The opaque impact of the number of negative samples on performance when employing contrastive learning aroused our in-depth exploration. This paper presents a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE. We add the prediction layer to the online branch to make the model asymmetric and together with EMA update mechanism of the target branch to prevent the model from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
