Debiased Contrastive Learning of Unsupervised Sentence Representations

Kun Zhou; Beichen Zhang; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2205.00656·cs.CL·May 3, 2022

Debiased Contrastive Learning of Unsupervised Sentence Representations

Kun Zhou, Beichen Zhang, Wayne Xin Zhao, Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces DCLR, a debiased contrastive learning framework that improves unsupervised sentence representations by addressing negative sampling bias, leading to better semantic similarity performance.

Contribution

The paper proposes a novel instance weighting and noise-based negative sampling method to mitigate negative sampling bias in contrastive learning for sentence representations.

Findings

01

Outperforms baseline methods on seven semantic textual similarity tasks.

02

Effectively reduces false negatives and improves representation uniformity.

03

Demonstrates robustness across various unsupervised settings.

Abstract

Recently, contrastive learning has been shown to be effective in improving pre-trained language models (PLM) to derive high-quality sentence representations. It aims to pull close positive examples to enhance the alignment while push apart irrelevant negatives for the uniformity of the whole representation space. However, previous works mostly adopt in-batch negatives or sample from training data at random. Such a way may cause the sampling bias that improper negatives (e.g. false negatives and anisotropy representations) are used to learn sentence representations, which will hurt the uniformity of the representation space. To address it, we present a new framework \textbf{DCLR} (\underline{D}ebiased \underline{C}ontrastive \underline{L}earning of unsupervised sentence \underline{R}epresentations) to alleviate the influence of these improper negatives. In DCLR, we design an instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucaibox/dclr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsContrastive Learning