Self-Normalized Importance Sampling for Neural Language Modeling
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf, Schl\"uter, Hermann Ney

TL;DR
This paper introduces self-normalized importance sampling for neural language modeling, reducing computational costs while maintaining competitive performance in speech recognition tasks.
Contribution
It proposes a self-normalized importance sampling method that eliminates the need for correction steps in sampling-based training criteria.
Findings
Competitive perplexity and word error rate performance
Faster training and testing in large vocabulary models
Effective in both research and production speech recognition
Abstract
To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based neural language models. These training criteria typically enjoy the benefit of faster training and testing, at a cost of slightly degraded performance in terms of perplexity and almost no visible drop in word error rate. While noise contrastive estimation is one of the most popular choices, recently we show that other sampling-based criteria can also perform well, as long as an extra correction step is done, where the intended class posterior probability is recovered from the raw model outputs. In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsSoftmax
