Reducing Exposure Bias in Training Recurrent Neural Network Transducers
Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

TL;DR
This paper addresses exposure bias in training recurrent neural network transducers for speech recognition by introducing input perturbations and sampling strategies, leading to improved accuracy and state-of-the-art results on Switchboard.
Contribution
It proposes novel methods to reduce exposure bias in RNNT training, including label-preserving perturbations and scheduled sampling with a language model.
Findings
Reduced exposure bias improves RNNT accuracy.
Achieved state-of-the-art results on Switchboard dataset.
Perturbation techniques enhance model generalization.
Abstract
When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
