CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol, Espy-Wilson

TL;DR
This paper investigates the use of continued pretraining (CPT) to adapt Wav2vec2.0 for noise-robust speech recognition in classroom environments, significantly reducing error rates under various acoustic conditions.
Contribution
It demonstrates that CPT effectively enhances Wav2vec2.0's robustness to classroom noise, microphones, and conditions, representing a novel application of CPT for domain adaptation in ASR.
Findings
CPT reduces WER by over 10% in classroom scenarios.
CPT improves robustness to noise and microphone variability.
Wav2vec2.0 with CPT outperforms baseline models in noisy environments.
Abstract
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones and classroom conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
