ManaTTS Persian: a recipe for creating TTS datasets for lower resource   languages

Mahta Fetrat Qharabagh; Zahra Dehghanian; Hamid R. Rabiee

arXiv:2409.07259·cs.SD·September 12, 2024

ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages

Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee

PDF

Open Access

TL;DR

This paper introduces ManaTTS, a large Persian speech dataset, along with tools and methods for dataset creation and speech recognition, enabling high-quality TTS and low-resource language processing.

Contribution

It provides the largest open Persian speech corpus and a transparent pipeline with novel tools for dataset collection and forced alignment tailored for low-resource languages.

Findings

01

Achieved a MOS of 3.76 with the TTS model, close to natural speech quality.

02

Developed a fully open, MIT-licensed pipeline for dataset creation and alignment.

03

Extended speech recognition evaluation with the VirgoolInformal dataset.

Abstract

In this study, we introduce ManaTTS, the most extensive publicly accessible single-speaker Persian corpus, and a comprehensive framework for collecting transcribed speech datasets for the Persian language. ManaTTS, released under the open CC-0 license, comprises approximately 86 hours of audio with a sampling rate of 44.1 kHz. Alongside ManaTTS, we also generated the VirgoolInformal dataset to evaluate Persian speech recognition models used for forced alignment, extending over 5 hours of audio. The datasets are supported by a fully transparent, MIT-licensed pipeline, a testament to innovation in the field. It includes unique tools for sentence tokenization, bounded audio segmentation, and a novel forced alignment method. This alignment technique is specifically designed for low-resource languages, addressing a crucial need in the field. With this dataset, we trained a Tacotron2-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques