Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren, Wei Zhou, Ozlem Kalinli

TL;DR
This paper introduces a token-weighted RNN-T criterion that reduces the impact of transcription errors during training, improving speech recognition accuracy especially in semi-supervised and error-prone data scenarios.
Contribution
The paper proposes a novel token-weighted RNN-T objective that mitigates errors from flawed transcriptions, enhancing model robustness in semi-supervised learning and noisy data conditions.
Findings
Up to 38% relative accuracy improvement with pseudo-labels.
Recovers 64%-99% of accuracy loss from transcription errors.
Effective in both semi-supervised and error-prone training settings.
Abstract
ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights. The new objective is used for mitigating accuracy loss from transcriptions errors in the training data, which naturally appear in two settings: pseudo-labeling and human annotation errors. Experiments results show that using our method for semi-supervised learning with pseudo-labels leads to a consistent accuracy improvement, up to 38% relative. We also analyze the accuracy degradation resulting from different levels of WER in the reference transcription, and show that token-weighted RNN-T is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning
