Learning to Compute Word Embeddings On the Fly

Dzmitry Bahdanau; Tom Bosc; Stanis{\l}aw Jastrz\k{e}bski; Edward; Grefenstette; Pascal Vincent; Yoshua Bengio

arXiv:1706.00286·cs.LG·March 8, 2018·67 cites

Learning to Compute Word Embeddings On the Fly

Dzmitry Bahdanau, Tom Bosc, Stanis{\l}aw Jastrz\k{e}bski, Edward, Grefenstette, Pascal Vincent, Yoshua Bengio

PDF

Open Access

TL;DR

This paper introduces a method to generate embeddings for rare words dynamically using auxiliary data, improving performance in NLP tasks without extensive pre-training.

Contribution

It proposes an end-to-end trainable network that predicts embeddings of rare words on the fly, reducing reliance on large external datasets.

Findings

01

Improved reading comprehension accuracy

02

Enhanced textual entailment performance

03

Better language modeling results

Abstract

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification