BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger

TL;DR
BayesSpeech introduces a Bayesian Transformer model for automatic speech recognition that uses variational inference to learn weight posteriors, resulting in faster training and near state-of-the-art accuracy.
Contribution
The paper presents a novel Bayesian Transformer architecture for speech recognition that incorporates variational inference to model weight uncertainty, enhancing training efficiency and performance.
Findings
Faster training times compared to traditional models
Achieves near state-of-the-art results on LibriSpeech-960
Introduces Bayesian approach to Transformer networks for speech recognition
Abstract
Recent developments using End-to-End Deep Learning models have been shown to have near or better performance than state of the art Recurrent Neural Networks (RNNs) on Automatic Speech Recognition tasks. These models tend to be lighter weight and require less training time than traditional RNN-based approaches. However, these models take frequentist approach to weight training. In theory, network weights are drawn from a latent, intractable probability distribution. We introduce BayesSpeech for end-to-end Automatic Speech Recognition. BayesSpeech is a Bayesian Transformer Network where these intractable posteriors are learned through variational inference and the local reparameterization trick without recurrence. We show how the introduction of variance in the weights leads to faster training time and near state-of-the-art performance on LibriSpeech-960.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsAttention Is All You Need · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Multi-Head Attention · Absolute Position Encodings · Dropout · Label Smoothing
