Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Xulong Zhang; Jianzong Wang; Ning Cheng; Jing Xiao

arXiv:2210.14723·cs.SD·October 27, 2022

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces a semi-supervised neural TTS approach that leverages a reference model and pseudo labels to improve speech naturalness and robustness in low-resource scenarios.

Contribution

It proposes a novel semi-supervised training scheme combining pre-training and pseudo label guidance for low-resource neural TTS.

Findings

01

Significant improvement in voice naturalness and robustness.

02

Effective reduction of overfitting with limited target data.

03

Enhanced performance over traditional supervised methods.

Abstract

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions. To address this issue, we propose a semi-supervised learning method for neural TTS in which labeled target data is limited, which can also resolve the problem of exposure bias in the previous auto-regressive models. Specifically, we pre-train the reference model based on Fastspeech2 with much source data, fine-tuned on a limited target dataset. Meanwhile, pseudo labels generated by the original reference model are used to guide the fine-tuned model's training further, achieve a regularization effect, and reduce the overfitting of the fine-tuned model during training on the limited target data. Experimental results show that our proposed semi-supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling

MethodsTest