Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis   of Expressive Speech

Vatsal Aggarwal; Marius Cotescu; Nishant Prateek; Jaime; Lorenzo-Trueba; and Roberto Barra-Chicote

arXiv:1911.12760·cs.LG·February 18, 2020

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

Vatsal Aggarwal, Marius Cotescu, Nishant Prateek, Jaime, Lorenzo-Trueba, and Roberto Barra-Chicote

PDF

TL;DR

This paper introduces a novel TTS system that uses VAEs and Normalizing Flows to synthesize expressive speech in a new style from just one example, improving naturalness and emotional intensity.

Contribution

It enhances style disentanglement in TTS with VAE and Householder Flow, enabling expressive speech synthesis from a single reference utterance.

Findings

01

22% reduction in KL-divergence

02

9% improvement in naturalness

03

59% perceived emotional intensity

Abstract

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second. Specifically, we enhance the disentanglement capabilities of a state-of-the-art sequence-to-sequence based system with a Variational AutoEncoder (VAE) and a Householder Flow. The proposed system provides a 22% KL-divergence reduction while jointly improving perceptual metrics over state-of-the-art. At synthesis time we use one example of expressive style as a reference input to the encoder for generating any text in the desired style. Perceptual MUSHRA evaluations show that we can create a voice with a 9% relative naturalness improvement over standard Neural Text-to-Speech, while also improving the perceived emotional intensity (59 compared to the 55 of neutral speech).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729