Controllable Neural Prosody Synthesis

Max Morrison; Zeyu Jin; Justin Salamon; Nicholas J. Bryan; Gautham J.; Mysore

arXiv:2008.03388·eess.AS·August 13, 2020

Controllable Neural Prosody Synthesis

Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J., Mysore

PDF

TL;DR

This paper introduces a neural prosody generator that enables user control over speech prosody, allowing correction of prosody errors and diverse emotion and excitement levels, while maintaining naturalness.

Contribution

It presents a novel user-controllable, context-aware neural prosody generator and a pitch-shifting vocoder to modify speech prosody effectively.

Findings

01

Successful incorporation of user control without losing naturalness

02

Effective correction of prosody errors in synthesized speech

03

Enhanced diversity in speaker emotions and excitement levels

Abstract

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators. However, these systems lack intuitive user controls over prosody, making them unable to rectify prosody errors (e.g., misplaced emphases and contextually inappropriate emotions) or generate prosodies with diverse speaker excitement levels and emotions. We address these limitations with a user-controllable, context-aware neural prosody generator. Given a real or synthesized speech recording, our model allows a user to input prosody constraints for certain time frames and generates the remaining time frames from input text and contextual prosody. We also propose a pitch-shifting neural vocoder to modify input speech to match the synthesized prosody. Through objective and subjective evaluations we show that we can successfully incorporate user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.