Building a Luganda Text-to-Speech Model From Crowdsourced Data

Sulaiman Kagumire; Andrew Katumba; Joyce Nakatumba-Nabende; John Quinn

arXiv:2405.10211·cs.SD·May 17, 2024

Building a Luganda Text-to-Speech Model From Crowdsourced Data

Sulaiman Kagumire, Andrew Katumba, Joyce Nakatumba-Nabende, John Quinn

PDF

Open Access

TL;DR

This paper demonstrates that Luganda TTS quality can be significantly improved by training on multiple speakers with similar intonation and applying advanced preprocessing techniques, despite limited high-quality data.

Contribution

The study introduces a method of enhancing Luganda TTS by selecting multiple speakers with close intonation and applying data preprocessing, leading to higher perceived speech quality.

Findings

01

TTS quality improved with multi-speaker training and preprocessing.

02

Model trained on six speakers outperforms single- and two-speaker models.

03

Subjective MOS increased from 2.5 to 3.55 with the proposed approach.

Abstract

Text-to-speech (TTS) development for African languages such as Luganda is still limited, primarily due to the scarcity of high-quality, single-speaker recordings essential for training TTS models. Prior work has focused on utilizing the Luganda Common Voice recordings of multiple speakers aged between 20-49. Although the generated speech is intelligible, it is still of lower quality than the model trained on studio-grade recordings. This is due to the insufficient data preprocessing methods applied to improve the quality of the Common Voice recordings. Furthermore, speech convergence is more difficult to achieve due to varying intonations, as well as background noise. In this paper, we show that the quality of Luganda TTS from Common Voice can improve by training on multiple speakers of close intonation in addition to further preprocessing of the training data. Specifically, we selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · ICT in Developing Communities · Natural Language Processing Techniques