Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven   Acoustic Embedding Selection

Shubhi Tyagi; Marco Nicolis; Jonas Rohnke; Thomas Drugman; Jaime; Lorenzo-Trueba

arXiv:1912.00955·cs.CL·April 21, 2021

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime, Lorenzo-Trueba

PDF

TL;DR

This paper introduces a linguistics-driven acoustic embedding selection method to enhance prosody and naturalness in speech synthesis, especially for stylistic and long-form speech, by leveraging semantic and syntactic features.

Contribution

It proposes a novel embedding selection approach that exploits linguistic information to improve dynamic prosody in TTS systems, addressing naturalness and variability.

Findings

01

Improved prosody and naturalness in complex utterances

02

Enhanced performance in Long Form Reading scenarios

03

Effective use of semantic and syntactic features

Abstract

Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic speech such as the newscaster speaking style. We propose a novel embedding selection approach which exploits linguistic information, leveraging the speech variability present in the training dataset. We analyze the contribution of both semantic and syntactic features. Our results show that the approach improves the prosody and naturalness for complex utterances as well as in Long Form Reading (LFR).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.