Generate, Discriminate and Contrast: A Semi-Supervised Sentence   Representation Learning Framework

Yiming Chen; Yan Zhang; Bin Wang; Zuozhu Liu; Haizhou Li

arXiv:2210.16798·cs.CL·November 1, 2022

Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework

Yiming Chen, Yan Zhang, Bin Wang, Zuozhu Liu, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces GenSE, a semi-supervised framework for sentence embedding that synthesizes and filters sentence pairs from unlabeled data, improving performance on semantic similarity and domain adaptation tasks.

Contribution

The paper presents a novel semi-supervised approach combining generation, discrimination, and contrastive learning for sentence embeddings, outperforming existing methods.

Findings

01

Achieves 85.19 average correlation on STS datasets

02

Significantly improves domain adaptation performance

03

Outperforms state-of-the-art sentence embedding methods

Abstract

Most sentence embedding techniques heavily rely on expensive human-annotated sentence pairs as the supervised signals. Despite the use of large-scale unlabeled data, the performance of unsupervised methods typically lags far behind that of the supervised counterparts in most downstream tasks. In this work, we propose a semi-supervised sentence embedding framework, GenSE, that effectively leverages large-scale unlabeled data. Our method include three parts: 1) Generate: A generator/discriminator model is jointly trained to synthesize sentence pairs from open-domain unlabeled corpus; 2) Discriminate: Noisy sentence pairs are filtered out by the discriminator to acquire high-quality positive and negative sentence pairs; 3) Contrast: A prompt-based contrastive approach is presented for sentence representation learning with both annotated and synthesized data. Comprehensive experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matthewcym/gense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification