TL;DR
This paper presents a multilingual text-to-speech model using meta-learning that efficiently shares information across languages, producing natural, high-quality speech with less data and improved code-switching capabilities.
Contribution
The authors introduce a novel meta-learning based TTS model that enhances multilingual speech synthesis and voice cloning with less training data and improved cross-lingual performance.
Findings
Effective multilingual sharing across languages
Superior naturalness and accuracy in code-switching speech
Robust performance with limited training data
Abstract
We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches. Our model is based on Tacotron 2 with a fully convolutional input text encoder whose weights are predicted by a separate parameter generator network. To boost voice cloning, the model uses an adversarial speaker classifier with a gradient reversal layer that removes speaker-specific information from the encoder. We arranged two experiments to compare our model with baselines using various levels of cross-lingual parameter sharing, in order to evaluate: (1) stability and performance when training on low amounts of data, (2) pronunciation accuracy and voice quality of code-switching synthesis. For training, we used the CSS10 dataset and our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Long Short-Term Memory · Highway Layer · Mixture of Logistic Distributions · Residual Connection · Zoneout · Batch Normalization · Bidirectional LSTM · Location Sensitive Attention · Dilated Causal Convolution
