Data Efficient Voice Cloning for Neural Singing Synthesis
Merlijn Blaauw, Jordi Bonada, Ryunosuke Daido

TL;DR
This paper presents a data-efficient voice cloning method for neural singing synthesis, enabling high-quality voice adaptation from limited data by leveraging multispeaker models, with evaluation across diverse languages and use cases.
Contribution
It adapts voice cloning techniques from speech synthesis to singing, demonstrating effective voice adaptation with minimal data in neural singing synthesis.
Findings
Effective voice cloning with limited data
High-quality synthesis across multiple languages
Successful adaptation to unseen voices
Abstract
There are many use cases in singing synthesis where creating voices from small amounts of data is desirable. In text-to-speech there have been several promising results that apply voice cloning techniques to modern deep learning based models. In this work, we adapt one such technique to the case of singing synthesis. By leveraging data from many speakers to first create a multispeaker model, small amounts of target data can then efficiently adapt the model to new unseen voices. We evaluate the system using listening tests across a number of different use cases, languages and kinds of data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
