Learn to Sing by Listening: Building Controllable Virtual Singer by   Unsupervised Learning from Voice Recordings

Wei Xue; Yiwen Wang; Qifeng Liu; Yike Guo

arXiv:2305.05401·cs.SD·May 10, 2023·1 cites

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings

Wei Xue, Yiwen Wang, Qifeng Liu, Yike Guo

PDF

Open Access 1 Models

TL;DR

This paper introduces an unsupervised variational auto-encoder framework that digitizes and controls virtual singing voices by learning from voice recordings, enabling creation and manipulation of virtual singers without labeled data.

Contribution

It presents a novel unsupervised VAE-based approach to model and control singing voices using only speech recordings, eliminating the need for extensive labeled datasets.

Findings

01

Effective digitization of singing voices from speech recordings

02

Ability to control and interpolate vocal characteristics

03

Successful generation of virtual singers with controllable features

Abstract

The virtual world is being established in which digital humans are created indistinguishable from real humans. Producing their audio-related capabilities is crucial since voice conveys extensive personal characteristics. We aim to create a controllable audio-form virtual singer; however, supervised modeling and controlling all different factors of the singing voice, such as timbre, tempo, pitch, and lyrics, is extremely difficult since accurately labeling all such information needs enormous labor work. In this paper, we propose a framework that could digitize a person's voice by simply "listening" to the clean voice recordings of any content in a fully unsupervised manner and predict singing voices even only using speaking recordings. A variational auto-encoder (VAE) based framework is developed, which leverages a set of pre-trained models to encode the audio as various hidden…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
xihan123/so-vits-svc-5.0-nine
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis