Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati, Nikolaos Ellinas, Konstantinos Markopoulos, Georgios, Vamvoukakis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros, Tsiakoulis

TL;DR
This paper presents a phonological feature-based multilingual TTS model that enables effective cross-lingual speaker adaptation with very limited data, achieving high naturalness and speaker similarity even in few-shot scenarios.
Contribution
It introduces a language-agnostic multispeaker TTS model conditioned on phonological features, enabling cross-lingual adaptation with minimal data and demonstrating few-shot learning capabilities.
Findings
High speaker similarity with as few as 8 utterances.
Model performs well in zero-shot and few-shot adaptation scenarios.
Effective across multiple language pairs with phonological features.
Abstract
The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless uttering of foreign text embedded in a stream of native text. In our work, we train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages, with the goal of achieving cross-lingual speaker adaptation. We first experiment with the effect of language phonological similarity on cross-lingual TTS of several source-target language combinations. Subsequently, we fine-tune the model with very limited data of a new speaker's voice in either a seen or an unseen language, and achieve synthetic speech of equal quality, while preserving the target speaker's identity. With as few as 32…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
