Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning
Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, and, Laurent Besacier

TL;DR
This paper demonstrates that self-supervised speech representation learning enables effective ASR development for low-resource African languages using minimal transcribed data, highlighting the importance of pre-training on large raw speech datasets.
Contribution
It introduces a practical approach for developing ASR systems for African languages with limited labeled data, leveraging self-supervised learning and pre-training techniques.
Findings
Pre-training on large raw speech datasets improves ASR performance.
Effective ASR systems can be built with only 1 hour of transcribed speech.
Self-supervised learning is crucial in low-resource language settings.
Abstract
This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. After a series of lectures and labs on speech data collection using mobile applications and on self-supervised representation learning from speech, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for three languages: Wolof, Ga, and Somali. This paper describes how data was collected and ASR systems developed with a small amount (1h) of transcribed speech as training data. In these low resource conditions, pre-training a model on large amounts of raw speech was fundamental for the efficiency of ASR systems developed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
