From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data
Ahmed Adel Attia, Dorottya Demszky, Jing Liu, Carol Espy-Wilson

TL;DR
This paper introduces Weakly Supervised Pretraining (WSP), a two-step training approach that leverages 5,000 hours of noisy classroom transcripts along with minimal accurate data to improve speech recognition in low-resource settings.
Contribution
The paper presents WSP, a novel method combining weak transcript pretraining and fine-tuning, specifically designed for low-resource classroom ASR with abundant noisy data and limited gold-standard labels.
Findings
WSP outperforms alternative methods on synthetic and real data.
Pretraining on weak transcripts significantly improves ASR performance.
WSP is effective in real-world low-resource classroom speech scenarios.
Abstract
Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
