SingIt! Singer Voice Transformation
Amit Eliav, Aaron Taub, Renana Opochinsky, Sharon Gannot

TL;DR
This paper introduces SingIt!, a system that transforms speech into singing voice using zero-shot style transfer, enabling anyone to sing any song quickly with simple, modular components.
Contribution
It presents a novel zero-shot, many-to-many style transfer model for singing voice generation from speech, combining simple modules for a complex task.
Findings
System successfully converts speech to singing with non-expert listeners
Samples demonstrate the model's ability to produce singing voices
Modular approach simplifies the complex task of singing voice transformation
Abstract
In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner. We present a system comprising several available blocks, as well as a modified auto-encoder, and show how this highly-complex challenge can be achieved by tailoring rather simple solutions together. We demonstrate the applicability of the proposed system using a group of 25 non-expert listeners. Samples of the data generated from our model are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
