Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang; Zihang Dai; Ruslan Salakhutdinov; William W. Cohen

arXiv:1711.03953·cs.CL·March 6, 2018·64 cites

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen

PDF

Open Access 5 Repos

TL;DR

This paper identifies a limitation in Softmax-based neural language models related to their expressiveness and proposes a simple method to overcome this bottleneck, significantly improving perplexity scores on multiple datasets.

Contribution

The paper introduces a novel approach to address the Softmax bottleneck, enhancing the capacity of language models to better capture natural language complexity.

Findings

01

Achieved state-of-the-art perplexities on Penn Treebank and WikiText-2 datasets.

02

Outperformed baseline models by over 5.6 perplexity points on the 1B Word dataset.

03

Demonstrated that the proposed method effectively increases model expressiveness.

Abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSigmoid Activation · Tanh Activation · Dropout · Temporal Activation Regularization · Activation Regularization · Weight Tying · Embedding Dropout · Variational Dropout · Long Short-Term Memory · DropConnect