TL;DR
This paper explores using noisy radio archives and unsupervised learning to develop speech recognition for low-resource, illiterate populations, releasing datasets and models for West African languages.
Contribution
It introduces new datasets, a novel speech encoder trained on radio archives, and the first speech recognition models for several West African languages.
Findings
West African wav2vec performs comparably to larger models on multilingual tasks.
The models significantly improve language identification in low-resource settings.
First speech recognition models for Maninka, Pular, and Susu languages.
Abstract
For many of the 700 million illiterate people around the world, speech recognition technology could provide a bridge to valuable information and services. Yet, those most in need of this technology are often the most underserved by it. In many countries, illiterate people tend to speak only low-resource languages, for which the datasets necessary for speech technology development are scarce. In this paper, we investigate the effectiveness of unsupervised speech representation learning on noisy radio broadcasting archives, which are abundant even in low-resource languages. We make three core contributions. First, we release two datasets to the research community. The first, West African Radio Corpus, contains 142 hours of audio in more than 10 languages with a labeled validation subset. The second, West African Virtual Assistant Speech Recognition Corpus, consists of 10K labeled audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
