S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence   Embedding

Junlei Zhang; Zhenzhong lan

arXiv:2111.11750·cs.CL·November 25, 2021·5 cites

S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding

Junlei Zhang, Zhenzhong lan

PDF

Open Access

TL;DR

S-SimCSE introduces a novel contrastive learning approach that samples sub-networks with varying dropout rates to improve sentence embedding quality, outperforming previous methods like SimCSE.

Contribution

The paper proposes sampling dropout rates from a distribution and a sentence-wise mask strategy to enhance contrastive learning of sentence embeddings.

Findings

01

S-SimCSE outperforms SimCSE by over 1% on BERT_base.

02

Sampling dropout rates improves embedding consistency.

03

The method enhances semantic text similarity performance.

Abstract

Contrastive learning has been studied for improving the performance of learning sentence embeddings. The current state-of-the-art method is the SimCSE, which takes dropout as the data augmentation method and feeds a pre-trained transformer encoder the same input sentence twice. The corresponding outputs, two sentence embeddings derived from the same sentence with different dropout masks, can be used to build a positive pair. A network being applied with a dropout mask can be regarded as a sub-network of itsef, whose expected scale is determined by the dropout rate. In this paper, we push sub-networks with different expected scales learn similar embedding for the same sentence. SimCSE failed to do so because they fixed the dropout rate to a tuned hyperparameter. We achieve this by sampling dropout rate from a distribution eatch forward process. As this method may make optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Softmax · Residual Connection · Layer Normalization · Adam