Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic   Speech Recognition for Elementary Math Classroom Settings

Ahmed Adel Attia; Dorottya Demszky; Tolulope Ogunremi; Jing Liu; Carol; Espy-Wilson

arXiv:2405.13018·cs.CL·May 24, 2024·1 cites

Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol, Espy-Wilson

PDF

Open Access

TL;DR

This paper demonstrates that continued pretraining of Wav2vec2.0 significantly enhances its robustness and accuracy for automatic speech recognition in diverse elementary classroom environments, addressing noise, microphone variability, and demographic differences.

Contribution

The study shows that continued pretraining effectively adapts Wav2vec2.0 for classroom ASR, reducing error rates and improving generalization to unseen demographics.

Findings

01

CPT reduces Word Error Rate by over 10%.

02

CPT improves robustness to noise and microphone variations.

03

CPT enhances generalization to unseen classroom demographics.

Abstract

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones, classroom conditions as well as classroom demographics. Our CPT models show improved ability to generalize to different demographics unseen in the labeled finetuning data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems