Data Efficient Voice Cloning for Neural Singing Synthesis

Merlijn Blaauw; Jordi Bonada; Ryunosuke Daido

arXiv:1902.07292·cs.SD·February 21, 2019

Data Efficient Voice Cloning for Neural Singing Synthesis

Merlijn Blaauw, Jordi Bonada, Ryunosuke Daido

PDF

TL;DR

This paper presents a data-efficient voice cloning method for neural singing synthesis, enabling high-quality voice adaptation from limited data by leveraging multispeaker models, with evaluation across diverse languages and use cases.

Contribution

It adapts voice cloning techniques from speech synthesis to singing, demonstrating effective voice adaptation with minimal data in neural singing synthesis.

Findings

01

Effective voice cloning with limited data

02

High-quality synthesis across multiple languages

03

Successful adaptation to unseen voices

Abstract

There are many use cases in singing synthesis where creating voices from small amounts of data is desirable. In text-to-speech there have been several promising results that apply voice cloning techniques to modern deep learning based models. In this work, we adapt one such technique to the case of singing synthesis. By leveraging data from many speakers to first create a multispeaker model, small amounts of target data can then efficiently adapt the model to new unseen voices. We evaluate the system using listening tests across a number of different use cases, languages and kinds of data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.