Accented Text-to-Speech Synthesis with Limited Data

Xuehao Zhou; Mingyang Zhang; Yi Zhou; Zhizheng Wu; Haizhou Li

arXiv:2305.04816·eess.AS·May 9, 2023·1 cites

Accented Text-to-Speech Synthesis with Limited Data

Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a limited-data accented TTS framework that models phonetic and prosodic variations separately, enabling effective accent rendering with minimal target accent data.

Contribution

It proposes a two-model accented TTS system with pre-training and fine-tuning, specifically designed for low-resource accent adaptation in speech synthesis.

Findings

01

Effective phonetic variation handling with a small lexicon

02

Improved prosodic rendering with limited speech data

03

Enhanced speech quality and accent similarity

Abstract

This paper presents an accented text-to-speech (TTS) synthesis framework with limited training data. We study two aspects concerning accent rendering: phonetic (phoneme difference) and prosodic (pitch pattern and phoneme duration) variations. The proposed accented TTS framework consists of two models: an accented front-end for grapheme-to-phoneme (G2P) conversion and an accented acoustic model with integrated pitch and duration predictors for phoneme-to-Mel-spectrogram prediction. The accented front-end directly models the phonetic variation, while the accented acoustic model explicitly controls the prosodic variation. Specifically, both models are first pre-trained on a large amount of data, then only the accent-related layers are fine-tuned on a limited amount of data for the target accent. In the experiments, speech data of three English accents, i.e., General American English, Irish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)