Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages
Anoop C S, Prathosh A P, A G Ramakrishnan

TL;DR
This paper demonstrates that unsupervised domain adaptation techniques, specifically adversarial training and domain separation networks, can significantly improve low-resource language ASR performance using high-resource language data.
Contribution
It introduces and evaluates two UDA architectures for building ASR in low-resource languages, showing notable improvements over baseline models.
Findings
GRL and DSN architectures improve WER by over 6% and 7%.
Proper source language selection enhances adaptation performance.
UDA schemes reduce the need for large annotated datasets in low-resource languages.
Abstract
Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language. We use the specific example of Hindi in the source domain and Sanskrit in the target domain. We explore two architectures: i) domain adversarial training using gradient reversal layer (GRL) and ii) domain separation networks (DSN). The GRL and DSN architectures give absolute improvements of 6.71% and 7.32%, respectively, in word error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
