A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for   22 Indian Languages

Sujitha Sathiyamoorthy (1); N Mohana (1); Anusha Prakash (3); Hema; A Murthy (1; 2) ((1) Dept of Computer Science & Engineering; Indian; Institute of Technology Madras; Chennai; India (2) Shiv Nadar University; Chennai; India; (3) Independent Researcher Bengaluru; India)

arXiv:2410.14197·eess.AS·October 21, 2024

A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Sujitha Sathiyamoorthy (1), N Mohana (1), Anusha Prakash (3), Hema, A Murthy (1, 2) ((1) Dept of Computer Science & Engineering, Indian, Institute of Technology Madras, Chennai, India (2) Shiv Nadar University, Chennai, India, (3) Independent Researcher Bengaluru, India)

PDF

Open Access

TL;DR

This paper reviews 15 years of data collection efforts for Indian languages, emphasizing the importance of data quality and standardized protocols for developing effective TTS systems across diverse languages.

Contribution

It presents a unified framework and insights for collecting high-quality TTS datasets for Indian languages, facilitating consistent development across multiple languages.

Findings

01

High-quality data enables effective TTS with smaller datasets.

02

Standardized data collection protocols improve TTS system development.

03

Collected datasets have supported various TTS frameworks and prosodically rich synthesis.

Abstract

The performance of a text-to-speech (TTS) synthesis model depends on various factors, of which the quality of the training data is of utmost importance. Millions of data are collected around the globe for various languages, but resources for Indian languages are few. Although there are many efforts involved in data collection, a common set of protocols for data collection becomes necessary for building TTS systems in Indian languages primarily because of the need for a uniform development of TTS systems across languages. In this paper, we present our learnings on data collection efforts' for Indic languages over 15 years. These databases have been used in unit selection synthesis, hidden Markov model based, and end-to-end frameworks, and for generating prosodically rich TTS systems. The most significant feature of the data collected is that data purity enables building high-quality TTS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsSparse Evolutionary Training