Learning to Create and Reuse Words in Open-Vocabulary Neural Language   Modeling

Kazuya Kawakami; Chris Dyer; Phil Blunsom

arXiv:1704.06986·cs.CL·April 25, 2017·27 cites

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Kazuya Kawakami, Chris Dyer, Phil Blunsom

PDF

Open Access

TL;DR

This paper introduces a hierarchical LSTM language model with a caching mechanism to better handle the creation and reuse of new words in open-vocabulary settings, validated across multiple languages.

Contribution

It proposes a novel caching mechanism integrated into a hierarchical LSTM to improve open-vocabulary language modeling, capturing word reuse patterns.

Findings

01

Model outperforms baseline in multiple languages

02

Effective in capturing word reuse and creation

03

Validated on multilingual Wikipedia corpus

Abstract

Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory