PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu

TL;DR
This paper presents PnG BERT, an augmented encoder for neural TTS that integrates phoneme and grapheme inputs with pre-training, improving speech naturalness and pronunciation accuracy.
Contribution
The paper introduces PnG BERT, a novel encoder combining phoneme and grapheme information with pre-training for neural TTS, enhancing speech quality.
Findings
Pre-trained PnG BERT improves prosody and pronunciation.
Speech quality with PnG BERT is comparable to ground truth.
Model achieves better naturalness than baseline without pre-training.
Abstract
This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model using a pre-trained PnG BERT as its encoder yields more natural prosody and more accurate pronunciation than a baseline model using only phoneme input with no pre-training. Subjective side-by-side preference evaluations show that raters have no statistically significant preference between the speech synthesized using a PnG BERT and ground truth recordings from professional speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsLinear Layer · SM3 · Weight Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Softmax · Dense Connections · Attention Is All You Need · Linear Warmup With Linear Decay
