Building African Voices

Perez Ogayo; Graham Neubig; Alan W Black

arXiv:2207.00688·cs.CL·July 5, 2022·1 cites

Building African Voices

Perez Ogayo, Graham Neubig, Alan W Black

PDF

Open Access 1 Repo

TL;DR

This paper develops methods for creating speech synthesis systems for low-resource African languages by curating datasets, providing guidelines, and sharing resources to enable accessible TTS development.

Contribution

It introduces a participatory approach for dataset creation, guidelines for low-resource TTS development, and releases resources for 12 African languages.

Findings

01

Synthesizers produce intelligible speech with only 25 minutes of data.

02

Participatory dataset curation improves data quality and accessibility.

03

Resources are publicly released for community use.

Abstract

Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neulab/africanvoices
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems