Fast Development of ASR in African Languages using Self Supervised   Speech Representation Learning

Jama Hussein Mohamud; Lloyd Acquaye Thompson; Aissatou Ndoye; and; Laurent Besacier

arXiv:2103.08993·cs.SD·March 17, 2021·AfricaNLP·1 cites

Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning

Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, and, Laurent Besacier

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that self-supervised speech representation learning enables effective ASR development for low-resource African languages using minimal transcribed data, highlighting the importance of pre-training on large raw speech datasets.

Contribution

It introduces a practical approach for developing ASR systems for African languages with limited labeled data, leveraging self-supervised learning and pre-training techniques.

Findings

01

Pre-training on large raw speech datasets improves ASR performance.

02

Effective ASR systems can be built with only 1 hour of transcribed speech.

03

Self-supervised learning is crucial in low-resource language settings.

Abstract

This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. After a series of lectures and labs on speech data collection using mobile applications and on self-supervised representation learning from speech, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for three languages: Wolof, Ga, and Somali. This paper describes how data was collected and ASR systems developed with a small amount (1h) of transcribed speech as training data. In these low resource conditions, pre-training a model on large amounts of raw speech was fundamental for the efficiency of ASR systems developed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

besacier/AMMIcourse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling