A Factorized Recurrent Neural Network based architecture for medium to   large vocabulary Language Modelling

Anantharaman Palacode Narayana Iyer

arXiv:1602.01576·cs.CL·February 5, 2016

A Factorized Recurrent Neural Network based architecture for medium to large vocabulary Language Modelling

Anantharaman Palacode Narayana Iyer

PDF

TL;DR

This paper introduces a novel factorized RNN architecture for large vocabulary language modeling that significantly reduces computational complexity and memory usage, enabling faster training without multistep prediction.

Contribution

It proposes an optimized factorized output layer for RNNs that improves efficiency and eliminates the need for multistep prediction in large vocabulary language models.

Findings

01

Speeds up language model training on large vocabularies

02

Reduces memory requirements for RNN output layers

03

Eliminates multistep prediction process

Abstract

Statistical language models are central to many applications that use semantics. Recurrent Neural Networks (RNN) are known to produce state of the art results for language modelling, outperforming their traditional n-gram counterparts in many cases. To generate a probability distribution across a vocabulary, these models require a softmax output layer that linearly increases in size with the size of the vocabulary. Large vocabularies need a commensurately large softmax layer and training them on typical laptops/PCs requires significant time and machine resources. In this paper we present a new technique for implementing RNN based large vocabulary language models that substantially speeds up computation while optimally using the limited memory resources. Our technique, while building on the notion of factorizing the output layer by having multiple output layers, improves on the earlier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax