DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised   Learning and Its Application to Children's ASR

Ruchao Fan; Abeer Alwan

arXiv:2206.07931·eess.AS·June 17, 2022

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

Ruchao Fan, Abeer Alwan

PDF

Open Access

TL;DR

This paper introduces DRAFT, a new framework that reduces domain shifting in self-supervised speech models by inserting residual adapters, significantly improving child speech recognition performance across multiple SSL methods.

Contribution

DRAFT is a novel, domain-responsible adaptation framework that effectively reduces domain shift in SSL speech models using residual adapters, applicable across various SSL techniques.

Findings

01

Up to 19.7% relative WER reduction on child ASR tasks.

02

DRAFT improves knowledge transfer between adult and child speech datasets.

03

Framework is compatible with multiple SSL methods like APC, Wav2vec2.0, and HuBERT.

Abstract

Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has been successful in low-resource automatic speech recognition (ASR) tasks. However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer. We propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models through an additional adaptation stage. In DRAFT, residual adapters (RAs) are inserted in the pretrained model to learn domain-related information with the same SSL loss as the pretraining stage. Only RA parameters are updated during the adaptation stage. DRAFT is agnostic to the type of SSL method used and is evaluated with three widely used approaches: APC, Wav2vec2.0,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems