Comparative Analysis of Transfer Learning in Deep Learning   Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

Ze Liu

arXiv:2310.04982·cs.SD·October 10, 2023

Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

Ze Liu

PDF

Open Access

TL;DR

This paper evaluates how transfer learning enhances deep learning-based Text-to-Speech models in low-resource, few-shot scenarios, aiming to improve voice quality with limited data and reduce training time.

Contribution

It provides a comprehensive analysis and experimental comparison of transfer learning effectiveness in TTS models on small, customized datasets, highlighting optimal approaches for low-resource conditions.

Findings

01

Transfer learning significantly improves TTS performance with limited data.

02

Certain models outperform others in low-resource settings.

03

Transfer learning reduces training time while maintaining high voice quality.

Abstract

Text-to-Speech (TTS) synthesis using deep learning relies on voice quality. Modern TTS models are advanced, but they need large amount of data. Given the growing computational complexity of these models and the scarcity of large, high-quality datasets, this research focuses on transfer learning, especially on few-shot, low-resource, and customized datasets. In this research, "low-resource" specifically refers to situations where there are limited amounts of training data, such as a small number of audio recordings and corresponding transcriptions for a particular language or dialect. This thesis, is rooted in the pressing need to find TTS models that require less training time, fewer data samples, yet yield high-quality voice output. The research evaluates TTS state-of-the-art model transfer learning capabilities through a thorough technical analysis. It then conducts a hands-on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing