Learning neural trans-dimensional random field language models with noise-contrastive estimation
Bin Wang, Zhijian Ou

TL;DR
This paper introduces improved training techniques for neural trans-dimensional random field language models, combining exponential tilting, noise-contrastive estimation, and deep neural networks to enhance scalability and performance in speech recognition.
Contribution
The paper proposes novel reformulations and estimation methods for neural TRF LMs, significantly boosting training efficiency and accuracy over previous approaches.
Findings
Achieved 40x larger training set with only 1/3 training time
Reduced word error rate by 4.7% relative over strong LSTM baseline
Enhanced neural TRF LMs with deep CNN and bidirectional LSTM features
Abstract
Trans-dimensional random field language models (TRF LMs) where sentences are modeled as a collection of random fields, have shown close performance with LSTM LMs in speech recognition and are computationally more efficient in inference. However, the training efficiency of neural TRF LMs is not satisfactory, which limits the scalability of TRF LMs on large training corpus. In this paper, several techniques on both model formulation and parameter estimation are proposed to improve the training efficiency and the performance of neural TRF LMs. First, TRFs are reformulated in the form of exponential tilting of a reference distribution. Second, noise-contrastive estimation (NCE) is introduced to jointly estimate the model parameters and normalization constants. Third, we extend the neural TRF LMs by marrying the deep convolutional neural network (CNN) and the bidirectional LSTM into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
