Prosody Learning Mechanism for Speech Synthesis System Without Text   Length Limit

Zhen Zeng; Jianzong Wang; Ning Cheng; Jing Xiao

arXiv:2008.05656·eess.AS·August 14, 2020·1 cites

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces a prosody learning mechanism for speech synthesis that models prosody variability and semantics, using a novel local attention structure to handle unlimited input length, improving speech naturalness.

Contribution

The paper proposes a new prosody learning approach combined with a local attention mechanism that removes input length restrictions in TTS systems.

Findings

01

Improved prosody quality in synthesized speech.

02

Significant MOS score increase in Mandarin synthesis.

03

Enhanced naturalness of speech output.

Abstract

Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together. In this paper, a prosody learning mechanism is proposed to model the prosody of speech based on TTS system, where the prosody information of speech is extracted from the melspectrum by a prosody learner and combined with the phoneme sequence to reconstruct the mel-spectrum. Meanwhile, the sematic features of text from the pre-trained language model is introduced to improve the prosody prediction results. In addition, a novel self-attention structure, named as local attention, is proposed to lift this restriction of input text length, where the relative position information of the sequence is modeled by the relative position matrices so that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling