Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
Shufan Wang, Laure Thompson, Mohit Iyyer

TL;DR
This paper introduces Phrase-BERT, a fine-tuned BERT model that generates more meaningful phrase embeddings, improving phrase similarity tasks and enabling effective phrase-based topic modeling.
Contribution
We propose a contrastive fine-tuning method for BERT using paraphrase datasets, enhancing phrase embeddings for better semantic and compositional understanding.
Findings
Outperforms baseline models on phrase similarity tasks
Increases lexical diversity in embedding space
Enables phrase-based neural topic modeling with improved coherence
Abstract
Phrase representations derived from BERT often do not exhibit complex phrasal compositionality, as the model relies instead on lexical similarity to determine semantic relatedness. In this paper, we propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings. Our approach (Phrase-BERT) relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model, as well as a large-scale dataset of phrases in context mined from the Books3 corpus. Phrase-BERT outperforms baselines across a variety of phrase-level similarity tasks, while also demonstrating increased lexical diversity between nearest neighbors in the vector space. Finally, as a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model that interprets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Layer Normalization · Softmax · Residual Connection
