Scalable Controllable Accented TTS
Henry Li Xinyuan, Zexin Cai, Ashi Garg, Kevin Duh, Leibny Paola Garc\'ia-Perera, Sanjeev Khudanpur, Nicholas Andrews, Matthew Wiesner

TL;DR
This paper presents a scalable accented TTS system that leverages automatic accent label discovery and data augmentation techniques to improve performance on diverse and underrepresented accents, validated on the CommonVoice dataset.
Contribution
It introduces novel methods for automatic accent labeling and data augmentation, enabling scalable and robust accented TTS training with minimal reliance on manual annotations.
Findings
Outperforms existing accented TTS benchmarks.
Improves accent diversity and robustness.
Effective accent label discovery via geolocation.
Abstract
We tackle the challenge of scaling accented TTS systems, expanding their capabilities to include much larger amounts of training data and a wider variety of accent labels, even for accents that are poorly represented or unlabeled in traditional TTS datasets. To achieve this, we employ two strategies: 1. Accent label discovery via a speech geolocation model, which automatically infers accent labels from raw speech data without relying solely on human annotation; 2. Timbre augmentation through kNN voice conversion to increase data diversity and model robustness. These strategies are validated on CommonVoice, where we fine-tune XTTS-v2 for accented TTS with accent labels discovered or enhanced using geolocation. We demonstrate that the resulting accented TTS model not only outperforms XTTS-v2 fine-tuned on self-reported accent labels in CommonVoice, but also existing accented TTS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques
