Rich Prosody Diversity Modelling with Phone-level Mixture Density   Network

Chenpeng Du; Kai Yu

arXiv:2102.00851·cs.SD·October 3, 2023

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Chenpeng Du, Kai Yu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a GMM-based mixture density network for phone-level prosody modeling, significantly enhancing the naturalness and diversity of synthetic speech compared to previous uni-modal approaches.

Contribution

It presents a novel GMM-MDN approach for phone-level prosody modeling, improving diversity and naturalness in speech synthesis.

Findings

01

GMM-MDN generates more natural prosody patterns.

02

The approach significantly improves prosody diversity.

03

Subjective evaluations favor GMM-MDN over single Gaussian models.

Abstract

Generating natural speech with diverse and smooth prosody pattern is a challenging task. Although random sampling with phone-level prosody distribution has been investigated to generate different prosody patterns, the diversity of the generated speech is still very limited and far from what can be achieved by human. This is largely due to the use of uni-modal distribution, such as single Gaussian, in the prior works of phone-level prosody modelling. In this work, we propose a novel approach that models phone-level prosodies with GMM based mixture density network (GMM-MDN). Experiments on the LJSpeech dataset demonstrate that phone-level prosodies can precisely control the synthetic speech and GMM-MDN can generate more natural and smooth prosody pattern than a single Gaussian. Subjective evaluations further show that the proposed approach not only achieves better naturalness, but also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling