Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling
Kazuya Kawakami, Chris Dyer, Phil Blunsom

TL;DR
This paper introduces a hierarchical LSTM language model with a caching mechanism to better handle the creation and reuse of new words in open-vocabulary settings, validated across multiple languages.
Contribution
It proposes a novel caching mechanism integrated into a hierarchical LSTM to improve open-vocabulary language modeling, capturing word reuse patterns.
Findings
Model outperforms baseline in multiple languages
Effective in capturing word reuse and creation
Validated on multilingual Wikipedia corpus
Abstract
Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
