S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding
Junlei Zhang, Zhenzhong lan

TL;DR
S-SimCSE introduces a novel contrastive learning approach that samples sub-networks with varying dropout rates to improve sentence embedding quality, outperforming previous methods like SimCSE.
Contribution
The paper proposes sampling dropout rates from a distribution and a sentence-wise mask strategy to enhance contrastive learning of sentence embeddings.
Findings
S-SimCSE outperforms SimCSE by over 1% on BERT_base.
Sampling dropout rates improves embedding consistency.
The method enhances semantic text similarity performance.
Abstract
Contrastive learning has been studied for improving the performance of learning sentence embeddings. The current state-of-the-art method is the SimCSE, which takes dropout as the data augmentation method and feeds a pre-trained transformer encoder the same input sentence twice. The corresponding outputs, two sentence embeddings derived from the same sentence with different dropout masks, can be used to build a positive pair. A network being applied with a dropout mask can be regarded as a sub-network of itsef, whose expected scale is determined by the dropout rate. In this paper, we push sub-networks with different expected scales learn similar embedding for the same sentence. SimCSE failed to do so because they fixed the dropout rate to a tuned hyperparameter. We achieve this by sampling dropout rate from a distribution eatch forward process. As this method may make optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Softmax · Residual Connection · Layer Normalization · Adam
