MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long   samples using Spectrograms

Marco Pasini

arXiv:1910.03713·eess.AS·December 6, 2019·31 cites

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Marco Pasini

PDF

Open Access 2 Repos

TL;DR

MelGAN-VC introduces a non-parallel, GAN-based voice and audio style transfer method capable of converting arbitrarily long samples, including speech and music, while preserving content and style effectively.

Contribution

It presents a novel non-parallel voice conversion framework using GANs and a siamese network, extending to music style transfer for arbitrary-length audio.

Findings

01

Effective voice conversion with non-parallel data

02

Successful style transfer for speech and music

03

Works on arbitrarily long audio samples

Abstract

Traditional voice conversion methods rely on parallel recordings of multiple speakers pronouncing the same sentences. For real-world applications however, parallel data is rarely available. We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker. We test our framework with a dataset of clean speech recordings, as well as with a collection of noisy real-world speech examples. Finally, we apply the same method to perform music style transfer,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsTest · Siamese Network