Retrofitting Contextualized Word Embeddings with Paraphrases
Weijia Shi, Muhao Chen, Pei Zhou, Kai-Wei Chang

TL;DR
This paper introduces a method to improve the robustness of contextualized word embeddings like ELMo against paraphrasing by learning an orthogonal transformation that stabilizes word representations across paraphrased contexts.
Contribution
We propose a novel retrofitting approach that enhances the stability of contextualized embeddings by minimizing their variance on paraphrased contexts, improving downstream task performance.
Findings
Retrofitted embeddings outperform original ELMo on sentence classification.
The method significantly improves robustness to paraphrasing.
Enhanced embeddings lead to better language inference results.
Abstract
Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo
