Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding
Hongliang He, Junlei Zhang, Zhenzhong Lan, Yue Zhang

TL;DR
This paper introduces IS-CSE, a novel contrastive learning method that smooths sentence embeddings by aggregating similar instances, leading to improved unsupervised sentence embedding performance.
Contribution
The study proposes a new instance smoothing technique for contrastive learning that enhances generalization in unsupervised sentence embeddings.
Findings
Achieves state-of-the-art results on STS tasks with BERT and RoBERTa models.
Improves Spearman's correlation by over 2% on average.
Demonstrates better generalization through embedding smoothing.
Abstract
Contrastive learning-based methods, such as unsup-SimCSE, have achieved state-of-the-art (SOTA) performances in learning unsupervised sentence embeddings. However, in previous studies, each embedding used for contrastive learning only derived from one sentence instance, and we call these embeddings instance-level embeddings. In other words, each embedding is regarded as a unique class of its own, whichmay hurt the generalization performance. In this study, we propose IS-CSE (instance smoothing contrastive sentence embedding) to smooth the boundaries of embeddings in the feature space. Specifically, we retrieve embeddings from a dynamic memory buffer according to the semantic similarity to get a positive embedding group. Then embeddings in the group are aggregated by a self-attention operation to produce a smoothed instance embedding for further analysis. We evaluate our method on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsBalanced Selection · Contrastive Learning
