Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

TL;DR
This paper introduces a phoneme-level BERT model trained to predict graphemes and phonemes, significantly enhancing the naturalness of synthesized speech in TTS systems, especially on out-of-distribution texts.
Contribution
The paper presents a novel phoneme-level BERT with a grapheme prediction task, improving TTS prosody and naturalness over existing models.
Findings
Significant MOS improvement over SOTA StyleTTS
Enhanced prosody for out-of-distribution texts
Effective phoneme-grapheme joint prediction
Abstract
Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dense Connections · Adam · Softmax · Layer Normalization · Linear Warmup With Linear Decay · Linear Layer · Dropout · Weight Decay
