Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource   Languages

Anoop C S; Prathosh A P; A G Ramakrishnan

arXiv:2109.05494·cs.CL·September 17, 2021·5 cites

Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages

Anoop C S, Prathosh A P, A G Ramakrishnan

PDF

Open Access

TL;DR

This paper demonstrates that unsupervised domain adaptation techniques, specifically adversarial training and domain separation networks, can significantly improve low-resource language ASR performance using high-resource language data.

Contribution

It introduces and evaluates two UDA architectures for building ASR in low-resource languages, showing notable improvements over baseline models.

Findings

01

GRL and DSN architectures improve WER by over 6% and 7%.

02

Proper source language selection enhances adaptation performance.

03

UDA schemes reduce the need for large annotated datasets in low-resource languages.

Abstract

Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language. We use the specific example of Hindi in the source domain and Sanskrit in the target domain. We explore two architectures: i) domain adversarial training using gradient reversal layer (GRL) and ii) domain separation networks (DSN). The GRL and DSN architectures give absolute improvements of 6.71% and 7.32%, respectively, in word error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing