Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech   Synthesis with Discrete Codec Modeling of EnGen-TTS

Onkar Kishor Susladkar; Vishesh Tripathi; Biddwan Ahmed

arXiv:2410.06608·cs.SD·October 10, 2024

Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS

Onkar Kishor Susladkar, Vishesh Tripathi, Biddwan Ahmed

PDF

Open Access

TL;DR

This paper presents a new comprehensive Bahasa TTS dataset and a novel EnGen-TTS model that significantly improves speech synthesis quality and efficiency for the Bahasa language.

Contribution

Introduces a large, diverse Bahasa TTS dataset and a novel EnGen-TTS model with superior performance and efficiency over existing baselines.

Findings

01

EnGen-TTS achieves a MOS of 4.45 ± 0.13.

02

The dataset contains approximately 55 hours and 52K audio samples.

03

EnGen-TTS demonstrates efficient real-time performance.

Abstract

This research introduces a comprehensive Bahasa text-to-speech (TTS) dataset and a novel TTS model, EnGen-TTS, designed to enhance the quality and versatility of synthetic speech in the Bahasa language. The dataset, spanning \textasciitilde55.0 hours and 52K audio recordings, integrates diverse textual sources, ensuring linguistic richness. A meticulous recording setup captures the nuances of Bahasa phonetics, employing professional equipment to ensure high-fidelity audio samples. Statistical analysis reveals the dataset's scale and diversity, laying the foundation for model training and evaluation. The proposed EnGen-TTS model performs better than established baselines, achieving a Mean Opinion Score (MOS) of 4.45 $\pm$ 0.13. Additionally, our investigation on real-time factor and model size highlights EnGen-TTS as a compelling choice, with efficient performance. This research marks a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEdcuational Technology Systems