Improving Language Modelling with Noise-contrastive estimation

Farhana Ferdousi Liza; Marek Grzes

arXiv:1709.07758·cs.CL·September 25, 2017

Improving Language Modelling with Noise-contrastive estimation

Farhana Ferdousi Liza, Marek Grzes

PDF

TL;DR

This paper demonstrates that with proper hyperparameter tuning and a new learning rate schedule, noise-contrastive estimation can effectively scale neural language models to large vocabularies, outperforming existing methods.

Contribution

It introduces the 'search-then-converge' learning rate schedule and provides hyperparameter tuning guidelines for NCE in neural language modeling.

Findings

01

NCE can outperform state-of-the-art models with proper tuning.

02

The 'search-then-converge' schedule improves NCE training stability.

03

Hyperparameters like dropout and initialization significantly affect NCE performance.

Abstract

Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for language modelling. A sufficient investigation of the hyperparameters in the NCE-based neural language models was also missing. In this paper, we showed that NCE can be a successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout