YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova, Julian Weber, Christopher Shulby, Arnaldo Candido, Junior, Eren G\"olge, Moacir Antonelli Ponti

TL;DR
YourTTS introduces a multilingual, zero-shot multi-speaker TTS and voice conversion system that achieves state-of-the-art results, works with low-resource languages, and can be fine-tuned with minimal data.
Contribution
It extends the VITS model with novel modifications for zero-shot multilingual and multi-speaker TTS, enabling high-quality synthesis with minimal data and in low-resource languages.
Findings
Achieved SOTA results in zero-shot multi-speaker TTS.
Comparable results to SOTA in zero-shot voice conversion.
Effective fine-tuning with less than 1 minute of speech.
Abstract
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DigitalUmuganda/Kinyarwanda_YourTTS_v1model· 34 dl· ♡ 434 dl♡ 4
- 🤗DigitalUmuganda/Kinyarwanda_YourTTSmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗infinisoft/ttsmodel· ♡ 4♡ 4
- 🤗Bilgilice/bilgilice35model
- 🤗Pendrokar/xvapitchmodel· ♡ 2♡ 2
- 🤗praveenchordia/ttsmodel· ♡ 1♡ 1
- 🤗DigitalUmuganda/KinyarwandaTTS_female_voicemodel· 7 dl· ♡ 17 dl♡ 1
- 🤗cshulby/YourTTSmodel· ♡ 9♡ 9
- 🤗anilaks/yourtts-vmmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling
MethodsUSD Coin Customer Service Number +1-833-534-1729 · Normalizing Flows · Transformer · HiFi-GAN
