Multi-Speaker End-to-End Speech Synthesis

Jihyun Park; Kexin Zhao; Kainan Peng; Wei Ping

arXiv:1907.04462·cs.CL·July 11, 2019·25 cites

Multi-Speaker End-to-End Speech Synthesis

Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping

PDF

Open Access

TL;DR

This paper introduces a multi-speaker end-to-end speech synthesis model based on ClariNet, which uses trainable speaker embeddings to generate high-fidelity speech for multiple voices, outperforming existing systems.

Contribution

The paper extends ClariNet to multi-speaker synthesis by incorporating speaker embeddings and demonstrates improved naturalness over state-of-the-art models.

Findings

01

Outperforms existing systems in speech naturalness

02

Uses shared speaker embeddings across model components

03

Achieves high-fidelity multi-speaker speech synthesis

Abstract

In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i.e., text-to-wave), to generate high-fidelity speech from multiple speakers. To model the unique characteristic of different voices, low dimensional trainable speaker embeddings are shared across each component of ClariNet and trained together with the rest of the model. We demonstrate that the multi-speaker ClariNet outperforms state-of-the-art systems in terms of naturalness, because the whole model is jointly optimized in an end-to-end manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsMixture of Logistic Distributions · Dilated Causal Convolution · Attention Is All You Need · Weight Normalization · Softmax · L1 Regularization · WaveNet · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Softsign Activation