Bayesian Neural Network Language Modeling for Speech Recognition
Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying, Liu, Helen Meng

TL;DR
This paper introduces a Bayesian framework for neural network language models like LSTM-RNNs and Transformers, improving speech recognition by modeling uncertainty and optimizing model components, leading to better performance on challenging tasks.
Contribution
It proposes a comprehensive Bayesian learning approach for neural language models, incorporating uncertainty modeling, neural architecture search, and efficient inference to enhance speech recognition accuracy.
Findings
Achieved consistent perplexity and WER improvements over baseline models.
Significant WER reductions up to 1.3% absolute on LRS2 data.
Efficient Bayesian inference with minimal Monte Carlo samples.
Abstract
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full Bayesian learning framework encompassing three methods is proposed in this paper to account for the underlying uncertainty in LSTM-RNN and Transformer LMs. The uncertainty over their model parameters, choice of neural activations and hidden output representations are modeled using Bayesian, Gaussian Process and variational LSTM-RNN or Transformer LMs respectively. Efficient inference approaches were used to automatically select the optimal network internal components to be Bayesian learned using neural architecture search. A minimal number of Monte Carlo parameter samples as low as one was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Layer Normalization · Dropout · Dense Connections · Adam · Position-Wise Feed-Forward Layer
