Using Paraphrases to Study Properties of Contextual Embeddings
Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea

TL;DR
This paper leverages paraphrases from the Paraphrase Database to analyze BERT's contextual embeddings, revealing strengths in handling polysemy but differences in synonym representations and layer-wise contextualization.
Contribution
It introduces a novel paraphrase-based approach to study properties of contextual embeddings, providing new insights into BERT's semantic handling and layer behaviors.
Findings
BERT effectively manages polysemous words.
Synonyms often have different representations in BERT.
BERT is sensitive to word order, with layer-wise contextualization patterns.
Abstract
We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database's alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT's layers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Adam · Residual Connection · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay
