ESimCSE: Enhanced Sample Building Method for Contrastive Learning of   Unsupervised Sentence Embedding

Xing Wu; Chaochen Gao; Liangjun Zang; Jizhong Han; Zhongyuan Wang,; Songlin Hu

arXiv:2109.04380·cs.CL·September 13, 2022·69 cites

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang,, Songlin Hu

PDF

Open Access 2 Repos

TL;DR

ESimCSE introduces a novel contrastive learning method for unsupervised sentence embeddings by modifying positive pairs and incorporating momentum contrast, resulting in improved semantic similarity performance.

Contribution

The paper proposes ESimCSE, enhancing unsupervised SimCSE with sentence repetition and momentum contrast to reduce bias and improve embedding quality.

Findings

01

ESimCSE outperforms unsup-SimCSE by 2.02% in Spearman correlation on BERT-base.

02

The method effectively reduces length bias in sentence embeddings.

03

Experimental results demonstrate improved semantic similarity accuracy.

Abstract

Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · SimCSE · Layer Normalization · Softmax · Label Smoothing · Byte Pair Encoding