Building African Voices
Perez Ogayo, Graham Neubig, Alan W Black

TL;DR
This paper develops methods for creating speech synthesis systems for low-resource African languages by curating datasets, providing guidelines, and sharing resources to enable accessible TTS development.
Contribution
It introduces a participatory approach for dataset creation, guidelines for low-resource TTS development, and releases resources for 12 African languages.
Findings
Synthesizers produce intelligible speech with only 25 minutes of data.
Participatory dataset curation improves data quality and accessibility.
Resources are publicly released for community use.
Abstract
Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems
