Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

Andrea DeMarco; Carlos Mena; Albert Gatt; Claudia Borg; Aiden; Williams; and Lonneke van der Plas

arXiv:2111.07793·cs.CL·January 23, 2023

Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

Andrea DeMarco, Carlos Mena, Albert Gatt, Claudia Borg, Aiden, Williams, and Lonneke van der Plas

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates various data augmentation techniques to enhance Maltese speech recognition in low-resource settings, demonstrating a 15% absolute WER improvement by combining methods without relying on language models.

Contribution

It systematically compares unsupervised, multilingual, and synthesized speech augmentation methods for Maltese ASR, identifying effective combinations for low-resource scenarios.

Findings

01

Combining augmentation methods improves WER by 15%.

02

Synthesized speech significantly boosts recognition accuracy.

03

No language model needed for substantial performance gains.

Abstract

Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MLRS/masri_synthetic
dataset· 174 dl
174 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques