Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker   Classifier Joint Training

J. Yang; Lei He

arXiv:2201.08124·cs.SD·January 21, 2022·6 cites

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

J. Yang, Lei He

PDF

Open Access

TL;DR

This paper proposes a multi-task learning framework with speaker classifier joint training to enhance cross-lingual speaker similarity in text-to-speech synthesis, effectively improving quality for both seen and unseen speakers.

Contribution

It introduces a novel multi-task learning approach combined with joint training and scheduled sampling to improve cross-lingual speaker similarity in TTS models.

Findings

01

Improved cross-lingual speaker similarity in subjective evaluations.

02

Enhanced objective metrics for speaker similarity.

03

Effective for both seen and unseen speakers.

Abstract

In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques