Learning from Flawed Data: Weakly Supervised Automatic Speech   Recognition

Dongji Gao; Hainan Xu; Desh Raj; Leibny Paola Garcia Perera; Daniel; Povey; Sanjeev Khudanpur

arXiv:2309.15796·eess.AS·September 28, 2023·1 cites

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel, Povey, Sanjeev Khudanpur

PDF

Open Access 1 Repo

TL;DR

This paper introduces Omni-temporal Classification (OTC), a new training method for speech recognition that effectively handles imperfect transcripts with high error rates, improving robustness over traditional methods.

Contribution

The paper proposes OTC, an extension of CTC that incorporates label uncertainties using weighted finite state transducers, enabling robust training with flawed data.

Findings

01

OTC maintains performance with transcripts up to 70% errors.

02

OTC outperforms standard CTC on datasets with imperfect transcripts.

03

Training with OTC prevents performance degradation caused by transcript errors.

Abstract

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. This allows the model to effectively learn speech-text alignments while accommodating errors present in the training transcripts. OTC extends the conventional CTC objective for imperfect transcripts by leveraging weighted finite state transducers. Through experiments conducted on the LibriSpeech and LibriVox datasets, we demonstrate that training ASR models with OTC avoids performance degradation even with transcripts containing up to 70% errors, a scenario where CTC models fail completely. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

k2-fsa/icefall
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

Methodsfail