A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive   Learning Framework for Sentence Embeddings

Haochen Tan; Wei Shao; Han Wu; Ke Yang; Linqi Song

arXiv:2203.05877·cs.CL·March 14, 2022·5 cites

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Haochen Tan, Wei Shao, Han Wu, Ke Yang, Linqi Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces Pseudo-Token BERT, a semantics-aware contrastive learning framework that improves sentence embeddings by focusing on latent semantic representations and reducing superficial feature influence, outperforming state-of-the-art methods.

Contribution

The paper proposes a novel pseudo-token based contrastive learning framework that effectively captures semantic content while eliminating superficial feature effects in sentence embeddings.

Findings

01

Outperforms state-of-the-art on six STS tasks

02

Effectively reduces superficial feature influence

03

Enhances embedding quality for varied sentence structures

Abstract

Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

namco0816/pt-bert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Contrastive Learning · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Dense Connections · Residual Connection · Weight Decay · Layer Normalization