SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
Natarajan Balaji Shankar, Ruchao Fan, and Abeer Alwan

TL;DR
This paper proposes Speech Only Adaptation (SOA), a simple method for domain adaptation of speech models that improves performance on target domains using only speech data, without retraining on labeled data.
Contribution
The paper introduces SOA, a novel speech-only adaptation technique for Wav2vec 2.0 that enhances domain transfer in low-resource ASR scenarios without additional labeled data.
Findings
Significant WER improvements on target domains
Preserves source domain performance
Effective in low-resource and domain mismatch settings
Abstract
Recently, speech foundation models have gained popularity due to their superiority in finetuning downstream ASR tasks. However, models finetuned on certain domains, such as LibriSpeech (adult read speech), behave poorly on other domains (child or noisy speech). One solution could be collecting as much labeled and diverse data as possible for joint finetuning on various domains. However, collecting target domain speech-text paired data and retraining the model is often costly and computationally expensive. In this paper, we introduce a simple yet effective method, speech only adaptation (SOA), based on speech foundation models (Wav2vec 2.0), which requires only speech input data from the target domain. Specifically, the Wav2vec 2.0 feature encoder is continually pretrained with the Wav2vec 2.0 loss on both the source and target domain data for domain adaptation, while the contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
