Self-Normalized Importance Sampling for Neural Language Modeling

Zijian Yang; Yingbo Gao; Alexander Gerstenberger; Jintao Jiang; Ralf; Schl\"uter; Hermann Ney

arXiv:2111.06310·cs.CL·June 20, 2022

Self-Normalized Importance Sampling for Neural Language Modeling

Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf, Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper introduces self-normalized importance sampling for neural language modeling, reducing computational costs while maintaining competitive performance in speech recognition tasks.

Contribution

It proposes a self-normalized importance sampling method that eliminates the need for correction steps in sampling-based training criteria.

Findings

01

Competitive perplexity and word error rate performance

02

Faster training and testing in large vocabulary models

03

Effective in both research and production speech recognition

Abstract

To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based neural language models. These training criteria typically enjoy the benefit of faster training and testing, at a cost of slightly degraded performance in terms of perplexity and almost no visible drop in word error rate. While noise contrastive estimation is one of the most popular choices, recently we show that other sampling-based criteria can also perform well, as long as an extra correction step is done, where the intended class posterior probability is recovered from the raw model outputs. In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsSoftmax