Making More of Little Data: Improving Low-Resource Automatic Speech   Recognition Using Data Augmentation

Martijn Bartelds; Nay San; Bradley McDonnell; Dan Jurafsky; and Martijn Wieling

arXiv:2305.10951·cs.CL·May 22, 2023·2 cites

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, and Martijn Wieling

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that data augmentation techniques like self-training and TTS can significantly improve low-resource ASR performance across diverse minority languages, reducing word error rates effectively.

Contribution

It introduces the application of self-training and TTS data augmentation methods to enhance ASR in low-resource, typologically diverse languages, showing substantial WER reductions.

Findings

01

Self-training yields up to 20.5% relative WER reduction.

02

TTS augmentation achieves up to 25.5% relative WER reduction.

03

Data augmentation effectively improves low-resource ASR performance.

Abstract

The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal). For all four languages, we examine the use of self-training, where an ASR system trained with the available human-transcribed data is used to generate transcriptions, which are then combined with the original data to train a new ASR system. For Gronings, for which there was a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bartelds/asr-augmentation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research