Enhancing TTS Stability in Hebrew using Discrete Semantic Units

Ella Zeldes; Or Tal; Yossi Adi

arXiv:2410.21502·cs.SD·October 30, 2024

Enhancing TTS Stability in Hebrew using Discrete Semantic Units

Ella Zeldes, Or Tal, Yossi Adi

PDF

Open Access

TL;DR

This paper presents a novel TTS approach using discrete semantic units derived from HuBERT codes, significantly improving stability and robustness in Hebrew speech synthesis while maintaining naturalness and speaker similarity.

Contribution

Introduces LOTHM, a TTS method leveraging self-supervised semantic units to enhance stability and reduce diacritic dependency, applicable across languages.

Findings

01

Achieves higher stability in Hebrew TTS

02

Maintains naturalness and speaker similarity

03

Demonstrates adaptability to English

Abstract

This study introduces a refined approach to Text-to-Speech (TTS) generation that significantly enhances sampling stability across languages, with a particular focus on Hebrew. By leveraging discrete semantic units with higher phonetic correlation obtained from a self-supervised model, our method addresses the inherent instability often encountered in TTS systems, especially those dealing with non-diacriticized scripts like Hebrew. Utilizing HuBERT codes, our model generates discrete representations that are optimized for TTS tasks, thereby reducing the dependency on diacritic-based text processing. This advancement not only simplifies the language modeling process but also improves the robustness and shows controllability of the speech output due to disentenglement properties of the semantic units. The inclusion of a speaker embedding in the vocoder further aids in capturing the unique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques