Domain Adaptation of low-resource Target-Domain models using   well-trained ASR Conformer Models

Vrunda N. Sukhadia; S. Umesh

arXiv:2202.09167·eess.AS·May 30, 2023

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

Vrunda N. Sukhadia, S. Umesh

PDF

Open Access

TL;DR

This paper presents a domain adaptation method for low-resource ASR that leverages embeddings from well-trained models' encoder layers, combined with Spectral Augmentation, to significantly improve target-domain recognition performance.

Contribution

The study introduces a novel approach of using encoder layer embeddings from pre-trained ASR models for target-domain adaptation, enhancing low-resource ASR performance.

Findings

01

30% relative improvement on LibriSpeech-100-clean data

02

50% relative improvement on WSJ data

03

Effective combination of encoder embeddings and Spectral Augmentation

Abstract

In this paper, we investigate domain adaptation for low-resource Automatic Speech Recognition (ASR) of target-domain data, when a well-trained ASR model trained with a large dataset is available. We argue that in the encoder-decoder framework, the decoder of the well-trained ASR model is largely tuned towards the source-domain, hurting the performance of target-domain models in vanilla transfer-learning. On the other hand, the encoder layers of the well-trained ASR model mostly capture the acoustic characteristics. We, therefore, propose to use the embeddings tapped from these encoder layers as features for a downstream Conformer target-domain model and show that they provide significant improvements. We do ablation studies on which encoder layer is optimal to tap the embeddings, as well as the effect of freezing or updating the well-trained ASR model's encoder layers. We further show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing