A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth, Arnon Turetzky, Yossi Adi

TL;DR
This paper introduces a diacritics-free language modeling approach for Hebrew TTS, effectively handling the absence of diacritics in modern Hebrew to improve pronunciation accuracy and speech naturalness.
Contribution
It presents a novel language modeling method for Hebrew TTS that operates without diacritics, trained on weakly supervised data, outperforming diacritic-based systems.
Findings
The proposed method outperforms baselines in content preservation.
It achieves higher naturalness in speech synthesis.
Effective on in-the-wild data.
Abstract
We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
