A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings
Haochen Tan, Wei Shao, Han Wu, Ke Yang, Linqi Song

TL;DR
This paper introduces Pseudo-Token BERT, a semantics-aware contrastive learning framework that improves sentence embeddings by focusing on latent semantic representations and reducing superficial feature influence, outperforming state-of-the-art methods.
Contribution
The paper proposes a novel pseudo-token based contrastive learning framework that effectively captures semantic content while eliminating superficial feature effects in sentence embeddings.
Findings
Outperforms state-of-the-art on six STS tasks
Effectively reduces superficial feature influence
Enhances embedding quality for varied sentence structures
Abstract
Contrastive learning has shown great potential in unsupervised sentence embedding tasks, e.g., SimCSE. However, We find that these existing solutions are heavily affected by superficial features like the length of sentences or syntactic structures. In this paper, we propose a semantics-aware contrastive learning framework for sentence embeddings, termed Pseudo-Token BERT (PT-BERT), which is able to exploit the pseudo-token space (i.e., latent semantic space) representation of a sentence while eliminating the impact of superficial features such as sentence length and syntax. Specifically, we introduce an additional pseudo token embedding layer independent of the BERT encoder to map each sentence into a sequence of pseudo tokens in a fixed length. Leveraging these pseudo sequences, we are able to construct same-length positive and negative pairs based on the attention mechanism to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Contrastive Learning · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Dense Connections · Residual Connection · Weight Decay · Layer Normalization
