VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for   Taiwanese Hakka

Li-Wei Chen; Hung-Shin Lee; Chen-Chi Chang

arXiv:2409.01548·cs.SD·October 3, 2024

VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

Li-Wei Chen, Hung-Shin Lee, Chen-Chi Chang

PDF

Open Access 1 Models

TL;DR

VoxHakka is a multi-dialect Taiwanese Hakka TTS system that achieves high naturalness and accuracy by utilizing dialect-specific data, innovative data collection, and ASR-based cleaning, outperforming existing systems.

Contribution

This paper presents VoxHakka, the first high-quality, multi-dialect Hakka TTS system trained on a novel dataset created through web scraping and ASR-based data cleaning techniques.

Findings

01

VoxHakka outperforms existing Hakka TTS systems in naturalness and pronunciation accuracy.

02

The system supports six Hakka dialects with high speaker awareness.

03

The dataset and methods facilitate resource-efficient Hakka speech synthesis.

Abstract

This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for the generation of speaker-aware Hakka speech. To address the scarcity of publicly available Hakka speech corpora, we employed a cost-effective approach utilizing a web scraping pipeline coupled with automatic speech recognition (ASR)-based data cleaning techniques. This process ensured the acquisition of a high-quality, multi-speaker, multi-dialect dataset suitable for TTS training. Subjective listening tests conducted using comparative mean opinion scores (CMOS) demonstrate that VoxHakka…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
formospeech/yourtts-htia-240704
model· 37 dl· ♡ 1
37 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems