Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol, Espy-Wilson

TL;DR
This paper demonstrates that continued pretraining of Wav2vec2.0 significantly enhances its robustness and accuracy for automatic speech recognition in diverse elementary classroom environments, addressing noise, microphone variability, and demographic differences.
Contribution
The study shows that continued pretraining effectively adapts Wav2vec2.0 for classroom ASR, reducing error rates and improving generalization to unseen demographics.
Findings
CPT reduces Word Error Rate by over 10%.
CPT improves robustness to noise and microphone variations.
CPT enhances generalization to unseen classroom demographics.
Abstract
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones, classroom conditions as well as classroom demographics. Our CPT models show improved ability to generalize to different demographics unseen in the labeled finetuning data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems
