Modelling low-resource accents without accent-specific TTS frontend

Georgi Tinchev; Marta Czarnowska; Kamil Deja; Kayoko Yanagisawa,; Marius Cotescu

arXiv:2301.04606·eess.AS·January 12, 2023

Modelling low-resource accents without accent-specific TTS frontend

Georgi Tinchev, Marta Czarnowska, Kamil Deja, Kayoko Yanagisawa,, Marius Cotescu

PDF

Open Access

TL;DR

This paper presents a method to model low-resource accents in TTS systems without needing accent-specific frontends, by augmenting data with voice conversion and training multi-accent models, achieving state-of-the-art results.

Contribution

It introduces a novel approach combining voice conversion and multi-accent TTS training to model low-resource accents without accent-specific frontends.

Findings

01

Achieves state-of-the-art results in accent modelling

02

Effective with limited data for low-resource accents

03

No need for accent-specific TTS frontends

Abstract

This work focuses on modelling a speaker's accent that does not have a dedicated text-to-speech (TTS) frontend, including a grapheme-to-phoneme (G2P) module. Prior work on modelling accents assumes a phonetic transcription is available for the target accent, which might not be the case for low-resource, regional accents. In our work, we propose an approach whereby we first augment the target accent data to sound like the donor voice via voice conversion, then train a multi-speaker multi-accent TTS model on the combination of recordings and synthetic data, to generate the donor's voice speaking in the target accent. Throughout the procedure, we use a TTS frontend developed for the same language but a different accent. We show qualitative and quantitative analysis where the proposed strategy achieves state-of-the-art results compared to other generative models. Our work demonstrates that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing