CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for   Classroom Environments

Ahmed Adel Attia; Dorottya Demszky; Tolulope Ogunremi; Jing Liu; Carol; Espy-Wilson

arXiv:2409.14494·cs.CL·March 13, 2025

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments

Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol, Espy-Wilson

PDF

Open Access

TL;DR

This paper investigates the use of continued pretraining (CPT) to adapt Wav2vec2.0 for noise-robust speech recognition in classroom environments, significantly reducing error rates under various acoustic conditions.

Contribution

It demonstrates that CPT effectively enhances Wav2vec2.0's robustness to classroom noise, microphones, and conditions, representing a novel application of CPT for domain adaptation in ASR.

Findings

01

CPT reduces WER by over 10% in classroom scenarios.

02

CPT improves robustness to noise and microphone variability.

03

Wav2vec2.0 with CPT outperforms baseline models in noisy environments.

Abstract

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones and classroom conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing