Improving low-resource ASR performance with untranscribed out-of-domain   data

Jayadev Billa

arXiv:2106.01227·cs.CL·June 3, 2021·1 cites

Improving low-resource ASR performance with untranscribed out-of-domain data

Jayadev Billa

PDF

Open Access

TL;DR

This paper demonstrates that in low-resource ASR, using out-of-domain web data with a two-stage training approach (pre-training on out-of-domain data then fine-tuning) significantly improves recognition accuracy.

Contribution

It introduces a simple yet effective semi-supervised training method that leverages out-of-domain web data for low-resource ASR, showing consistent WER improvements across multiple languages.

Findings

01

Up to 16.3% relative WER reduction over baseline

02

Training on out-of-domain data before fine-tuning yields better results

03

Pooling out-of-domain data with training data can sometimes decrease performance

Abstract

Semi-supervised training (SST) is a common approach to leverage untranscribed/unlabeled speech data to improve automatic speech recognition performance in low-resource languages. However, if the available unlabeled speech is mismatched to the target domain, SST is not as effective, and in many cases performs worse than the original system. In this paper, we address the issue of low-resource ASR when only untranscribed out-of-domain speech data is readily available in the target language. Specifically, we look to improve performance on conversational/telephony speech (target domain) using web resources, in particular YouTube data, which more closely resembles news/topical broadcast data. Leveraging SST, we show that while in some cases simply pooling the out-of-domain data with the training data lowers word error rate (WER), in all cases, we see improvements if we train first with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems