Unsupervised Sentence Representation Learning with Frequency-induced   Adversarial Tuning and Incomplete Sentence Filtering

Bing Wang; Ximing Li; Zhiyao Yang; Yuanyuan Guan; Jiayin Li,; Shengsheng Wang

arXiv:2305.08655·cs.CL·May 16, 2023·1 cites

Unsupervised Sentence Representation Learning with Frequency-induced Adversarial Tuning and Incomplete Sentence Filtering

Bing Wang, Ximing Li, Zhiyao Yang, Yuanyuan Guan, Jiayin Li,, Shengsheng Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel unsupervised sentence representation learning framework that uses frequency-based adversarial tuning and incomplete sentence filtering to improve embedding quality by reducing frequency bias.

Contribution

It proposes a flexible, plug-and-play USRL framework, SLT-FAI, that leverages word frequency information and adversarial training to produce more uniform and informative sentence embeddings.

Findings

01

SLT-FAI outperforms existing USRL methods on benchmark datasets.

02

The framework effectively reduces frequency bias in sentence embeddings.

03

Incorporating incomplete sentence filtering enhances low-frequency word representation.

Abstract

Pre-trained Language Model (PLM) is nowadays the mainstay of Unsupervised Sentence Representation Learning (USRL). However, PLMs are sensitive to the frequency information of words from their pre-training corpora, resulting in anisotropic embedding space, where the embeddings of high-frequency words are clustered but those of low-frequency words disperse sparsely. This anisotropic phenomenon results in two problems of similarity bias and information bias, lowering the quality of sentence embeddings. To solve the problems, we fine-tune PLMs by leveraging the frequency information of words and propose a novel USRL framework, namely Sentence Representation Learning with Frequency-induced Adversarial tuning and Incomplete sentence filtering (SLT-FAI). We calculate the word frequencies over the pre-training corpora of PLMs and assign words thresholding frequency labels. With them, (1) we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangbing1416/slt-fai
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification