BlackOut: Speeding up Recurrent Neural Network Language Models With Very   Large Vocabularies

Shihao Ji; S. V. N. Vishwanathan; Nadathur Satish; Michael J. Anderson; and Pradeep Dubey

arXiv:1511.06909·cs.LG·April 1, 2016·ICLR·35 cites

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, and Pradeep Dubey

PDF

Open Access 1 Repo

TL;DR

BlackOut is an efficient approximation algorithm for training large-scale RNN language models with massive vocabularies, achieving state-of-the-art results with reduced computation and training time.

Contribution

It introduces a novel sampling-based training method that extends DropOut to output layers, improving efficiency and stability in large vocabulary RNNLM training.

Findings

01

Outperforms existing methods on one billion word benchmark

02

Achieves lowest perplexity scores on the dataset

03

Requires only 1-10 days on a single machine for training

Abstract

We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a new sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to view it as an extension of the DropOut strategy to the output layer, wherein we use a discriminative training loss and a weighted sampling scheme. We also establish close connections between BlackOut, importance sampling, and noise contrastive estimation (NCE). Our experiments, on the recently released one billion word language modeling benchmark, demonstrate scalability and accuracy of BlackOut; we outperform the state-of-the art, and achieve the lowest perplexity scores on this dataset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IntelLabs/rnnlm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSoftmax · Dropout