PAUSE: Positive and Annealed Unlabeled Sentence Embedding
Lele Cao, Emil Larsson, Vilhelm von Ehrenheim, Dhiana Deva Cavalcanti, Rocha, Anna Martin, Sonja Horn

TL;DR
PAUSE is a novel method for learning high-quality sentence embeddings from partially labeled data, outperforming some state-of-the-art techniques with less labeled data and reducing manual annotation effort.
Contribution
The paper introduces PAUSE, an end-to-end approach that effectively learns sentence embeddings from limited labeled data, applicable in industrial scenarios.
Findings
PAUSE achieves or surpasses state-of-the-art results on benchmark tasks.
PAUSE performs well with only a small fraction of labeled data.
PAUSE reduces the need for extensive manual annotation in real-world applications.
Abstract
Sentence embedding refers to a set of effective and versatile techniques for converting raw text into numerical vector representations that can be used in a wide range of natural language processing (NLP) applications. The majority of these techniques are either supervised or unsupervised. Compared to the unsupervised methods, the supervised ones make less assumptions about optimization objectives and usually achieve better results. However, the training requires a large amount of labeled sentence pairs, which is not available in many industrial scenarios. To that end, we propose a generic and end-to-end approach -- PAUSE (Positive and Annealed Unlabeled Sentence Embedding), capable of learning high-quality sentence embeddings from a partially labeled dataset. We experimentally show that PAUSE achieves, and sometimes surpasses, state-of-the-art results using only a small fraction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Multi-Head Attention · Softmax
