BayesSpeech: A Bayesian Transformer Network for Automatic Speech   Recognition

Will Rieger

arXiv:2301.11276·eess.AS·January 27, 2023

BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

Will Rieger

PDF

Open Access

TL;DR

BayesSpeech introduces a Bayesian Transformer model for automatic speech recognition that uses variational inference to learn weight posteriors, resulting in faster training and near state-of-the-art accuracy.

Contribution

The paper presents a novel Bayesian Transformer architecture for speech recognition that incorporates variational inference to model weight uncertainty, enhancing training efficiency and performance.

Findings

01

Faster training times compared to traditional models

02

Achieves near state-of-the-art results on LibriSpeech-960

03

Introduces Bayesian approach to Transformer networks for speech recognition

Abstract

Recent developments using End-to-End Deep Learning models have been shown to have near or better performance than state of the art Recurrent Neural Networks (RNNs) on Automatic Speech Recognition tasks. These models tend to be lighter weight and require less training time than traditional RNN-based approaches. However, these models take frequentist approach to weight training. In theory, network weights are drawn from a latent, intractable probability distribution. We introduce BayesSpeech for end-to-end Automatic Speech Recognition. BayesSpeech is a Bayesian Transformer Network where these intractable posteriors are learned through variational inference and the local reparameterization trick without recurrence. We show how the introduction of variance in the weights leads to faster training time and near state-of-the-art performance on LibriSpeech-960.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsAttention Is All You Need · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Multi-Head Attention · Absolute Position Encodings · Dropout · Label Smoothing