ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding
Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang,, Songlin Hu

TL;DR
ESimCSE introduces a novel contrastive learning method for unsupervised sentence embeddings by modifying positive pairs and incorporating momentum contrast, resulting in improved semantic similarity performance.
Contribution
The paper proposes ESimCSE, enhancing unsupervised SimCSE with sentence repetition and momentum contrast to reduce bias and improve embedding quality.
Findings
ESimCSE outperforms unsup-SimCSE by 2.02% in Spearman correlation on BERT-base.
The method effectively reduces length bias in sentence embeddings.
Experimental results demonstrate improved semantic similarity accuracy.
Abstract
Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · SimCSE · Layer Normalization · Softmax · Label Smoothing · Byte Pair Encoding
