# Unsupervised Singing Voice Conversion

**Authors:** Eliya Nachmani, Lior Wolf

arXiv: 1904.06590 · 2019-09-26

## TL;DR

This paper introduces an unsupervised deep learning approach for singing voice conversion that does not require lyrics, phonetic features, or paired samples, using a CNN encoder, WaveNet decoder, and singer embeddings.

## Contribution

It presents a novel unsupervised singing voice conversion model that employs a singer-agnostic encoder, a decoder conditioned on singer embeddings, and new training protocols for small datasets.

## Key findings

- Converted voices are natural and highly recognizable as target singers.
- The method works without supervision, using only audio data.
- Proposed data augmentation and training losses improve performance.

## Abstract

We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers. The proposed network employs a single CNN encoder for all singers, a single WaveNet decoder, and a classifier that enforces the latent representation to be singer-agnostic. Each singer is represented by one embedding vector, which the decoder is conditioned on. In order to deal with relatively small datasets, we propose a new data augmentation scheme, as well as new training losses and protocols that are based on backtranslation. Our evaluation presents evidence that the conversion produces natural signing voices that are highly recognizable as the target singer.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.06590/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.06590/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.06590/full.md

---
Source: https://tomesphere.com/paper/1904.06590