Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi; Muhao Chen; Pei Zhou; Kai-Wei Chang

arXiv:1909.09700·cs.CL·September 27, 2019

Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi, Muhao Chen, Pei Zhou, Kai-Wei Chang

PDF

Open Access

TL;DR

This paper introduces a method to improve the robustness of contextualized word embeddings like ELMo against paraphrasing by learning an orthogonal transformation that stabilizes word representations across paraphrased contexts.

Contribution

We propose a novel retrofitting approach that enhances the stability of contextualized embeddings by minimizing their variance on paraphrased contexts, improving downstream task performance.

Findings

01

Retrofitted embeddings outperform original ELMo on sentence classification.

02

The method significantly improves robustness to paraphrasing.

03

Enhanced embeddings lead to better language inference results.

Abstract

Contextualized word embedding models, such as ELMo, generate meaningful representations of words and their context. These models have been shown to have a great impact on downstream applications. However, in many cases, the contextualized embedding of a word changes drastically when the context is paraphrased. As a result, the downstream model is not robust to paraphrasing and other linguistic variations. To enhance the stability of contextualized word embedding models, we propose an approach to retrofitting contextualized embedding models with paraphrase contexts. Our method learns an orthogonal transformation on the input space, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the retrofitted model significantly outperforms the original ELMo on various sentence classification and language inference tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo