Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition
Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs, Douze

TL;DR
Lead2Gold is a novel training method for speech recognition that effectively leverages noisy transcriptions by modeling errors and optimizing the training process end-to-end, improving accuracy over traditional approaches.
Contribution
It introduces a differentiable noise-aware beam search loss function for training ASR systems on noisy transcriptions, enabling end-to-end optimization.
Findings
Lead2Gold outperforms baseline models on noisy transcription data.
The method effectively models transcription errors to improve recognition accuracy.
End-to-end training without forced alignment is feasible with Lead2Gold.
Abstract
The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
