DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Ruchao Fan, Abeer Alwan

TL;DR
This paper introduces DRAFT, a new framework that reduces domain shifting in self-supervised speech models by inserting residual adapters, significantly improving child speech recognition performance across multiple SSL methods.
Contribution
DRAFT is a novel, domain-responsible adaptation framework that effectively reduces domain shift in SSL speech models using residual adapters, applicable across various SSL techniques.
Findings
Up to 19.7% relative WER reduction on child ASR tasks.
DRAFT improves knowledge transfer between adult and child speech datasets.
Framework is compatible with multiple SSL methods like APC, Wav2vec2.0, and HuBERT.
Abstract
Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has been successful in low-resource automatic speech recognition (ASR) tasks. However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer. We propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models through an additional adaptation stage. In DRAFT, residual adapters (RAs) are inserted in the pretrained model to learn domain-related information with the same SSL loss as the pretraining stage. Only RA parameters are updated during the adaptation stage. DRAFT is agnostic to the type of SSL method used and is evaluated with three widely used approaches: APC, Wav2vec2.0,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
