Learning to Compute Word Embeddings On the Fly
Dzmitry Bahdanau, Tom Bosc, Stanis{\l}aw Jastrz\k{e}bski, Edward, Grefenstette, Pascal Vincent, Yoshua Bengio

TL;DR
This paper introduces a method to generate embeddings for rare words dynamically using auxiliary data, improving performance in NLP tasks without extensive pre-training.
Contribution
It proposes an end-to-end trainable network that predicts embeddings of rare words on the fly, reducing reliance on large external datasets.
Findings
Improved reading comprehension accuracy
Enhanced textual entailment performance
Better language modeling results
Abstract
Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
