Improving Contrastive Learning of Sentence Embeddings with   Case-Augmented Positives and Retrieved Negatives

Wei Wang; Liangzhu Ge; Jingqiao Zhang; Cheng Yang

arXiv:2206.02457·cs.CL·June 7, 2022

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Wei Wang, Liangzhu Ge, Jingqiao Zhang, Cheng Yang

PDF

1 Repo

TL;DR

This paper introduces CARDS, a contrastive learning method that enhances sentence embeddings by using case-augmented positives and retrieved hard negatives, achieving state-of-the-art results in unsupervised settings.

Contribution

It proposes novel case augmentation and hard negative sampling techniques to improve the quality of contrastive learning for sentence embeddings.

Findings

01

CARDS outperforms previous SOTA methods on STS benchmarks

02

Case augmentation reduces bias in token embeddings

03

Hard negative sampling improves embedding discrimination

Abstract

Following SimCSE, contrastive learning based methods have achieved the state-of-the-art (SOTA) performance in learning sentence embeddings. However, the unsupervised contrastive learning methods still lag far behind the supervised counterparts. We attribute this to the quality of positive and negative samples, and aim to improve both. Specifically, for positive samples, we propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence. This is to counteract the intrinsic bias of pre-trained token embeddings to frequency, word cases and subwords. For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model. Combining the above two methods with SimCSE, our proposed Contrastive learning with Augmented and Retrieved Data for Sentence embedding (CARDS) method significantly surpasses the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba/simcse-with-cards
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFLIP · Contrastive Learning · SimCSE