Relational Data Selection for Data Augmentation of Speaker-dependent   Multi-band MelGAN Vocoder

Yi-Chiao Wu; Cheng-Hung Hu; Hung-Shin Lee; Yu-Huai Peng; Wen-Chin; Huang; Yu Tsao; Hsin-Min Wang; Tomoki Toda

arXiv:2106.05629·eess.AS·June 11, 2021·Interspeech

Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder

Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin, Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces a data augmentation method for speaker-dependent vocoders that selects similar utterances based on speaker verification to improve speech synthesis quality with limited target data.

Contribution

It proposes a novel data selection approach using speaker similarity for effective data augmentation in SD vocoder adaptation.

Findings

01

Enhanced speech quality with augmented data

02

Improved speaker similarity in synthesized speech

03

Effective adaptation with limited target data

Abstract

Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most real-world applications. To tackle the problem of limited target data, a data augmentation method based on speaker representation and similarity measurement of speaker verification is proposed in this paper. The proposed method selects utterances that have similar speaker identity to the target speaker from an external corpus, and then combines the selected utterances with the limited target data for SD vocoder adaptation. The evaluation results show that, compared with the vocoder adapted using only limited target data, the vocoder adapted using augmented data improves both the quality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing