ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie

TL;DR
This paper introduces ParaTTS, a paragraph-based neural TTS model that effectively captures cross-sentence linguistic and prosodic context, resulting in more natural and coherent speech synthesis over entire paragraphs.
Contribution
The paper proposes a novel multi-module TTS framework that models paragraph-level context using cross-sentence attention and explicit position encoding, improving over sentence-based models.
Findings
Enhanced naturalness and quality in paragraph TTS
Better prediction of prosodic variations between sentences
Consistent preference for paragraph-based synthesis in subjective tests
Abstract
Recent advancements in neural end-to-end TTS models have shown high-quality, natural synthesized speech in a conventional sentence-based TTS. However, it is still challenging to reproduce similar high quality when a whole paragraph is considered in TTS, where a large amount of contextual information needs to be considered in building a paragraph-based TTS model. To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training. Three sub-modules, including linguistics-aware, prosody-aware and sentence-position networks, are trained together with a modified Tacotron2. Specifically, to learn the information embedded in a paragraph and the relations among the corresponding component sentences, we utilize linguistics-aware and prosody-aware networks. The information in a paragraph is captured by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Linear Layer
