A Batch Noise Contrastive Estimation Approach for Training Large   Vocabulary Language Models

Youssef Oualil; Dietrich Klakow

arXiv:1708.05997·cs.CL·August 23, 2017

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Youssef Oualil, Dietrich Klakow

PDF

1 Repo

TL;DR

This paper introduces Batch Noise Contrastive Estimation (B-NCE), a method that reduces training time for large vocabulary neural language models by replacing softmax with noise contrastive estimation, without sacrificing performance.

Contribution

The paper proposes B-NCE, a novel approach that simplifies training large vocabulary models using matrix operations and noise contrastive estimation, improving efficiency over traditional softmax methods.

Findings

01

Significant reduction in training time on LTCB and OBWB datasets.

02

No noticeable performance degradation compared to softmax-based models.

03

Established a new baseline for NNLMs on the OBWB dataset.

Abstract

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Stonesjtu/Pytorch-NCE
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax