Bayesian Neural Network Language Modeling for Speech Recognition

Boyang Xue; Shoukang Hu; Junhao Xu; Mengzhe Geng; Xunying; Liu; Helen Meng

arXiv:2208.13259·cs.CL·August 30, 2022·1 cites

Bayesian Neural Network Language Modeling for Speech Recognition

Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying, Liu, Helen Meng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian framework for neural network language models like LSTM-RNNs and Transformers, improving speech recognition by modeling uncertainty and optimizing model components, leading to better performance on challenging tasks.

Contribution

It proposes a comprehensive Bayesian learning approach for neural language models, incorporating uncertainty modeling, neural architecture search, and efficient inference to enhance speech recognition accuracy.

Findings

01

Achieved consistent perplexity and WER improvements over baseline models.

02

Significant WER reductions up to 1.3% absolute on LRS2 data.

03

Efficient Bayesian inference with minimal Monte Carlo samples.

Abstract

State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full Bayesian learning framework encompassing three methods is proposed in this paper to account for the underlying uncertainty in LSTM-RNN and Transformer LMs. The uncertainty over their model parameters, choice of neural activations and hidden output representations are modeled using Bayesian, Gaussian Process and variational LSTM-RNN or Transformer LMs respectively. Efficient inference approaches were used to automatically select the optimal network internal components to be Bayesian learned using neural architecture search. A minimal number of Monte Carlo parameter samples as low as one was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amourwaltz/bayeslms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Layer Normalization · Dropout · Dense Connections · Adam · Position-Wise Feed-Forward Layer