StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for   Natural-Sounding Voice Conversion

Yinghao Aaron Li; Ali Zare; Nima Mesgarani

arXiv:2107.10394·cs.SD·July 26, 2021

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Yinghao Aaron Li, Ali Zare, Nima Mesgarani

PDF

2 Repos

TL;DR

StarGANv2-VC introduces a versatile, unsupervised, non-parallel voice conversion framework that produces natural, high-quality speech across various tasks, including cross-lingual and stylistic conversions, with real-time performance.

Contribution

It presents a novel GAN-based voice conversion model that generalizes well with limited data and can handle multiple conversion scenarios without parallel training data.

Findings

01

Outperforms previous VC models in naturalness and quality.

02

Generalizes to cross-lingual and stylistic voice conversions.

03

Operates in real-time with a faster-than-real-time vocoder.

Abstract

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain reading speech into stylistic speech, such as emotional and falsetto speech. Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art text-to-speech (TTS) based voice conversion methods without the need for text labels. Moreover, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Phase Shuffle · Convolution · Tanh Activation · Dense Connections · HuMan(Expedia)||How do I get a human at Expedia? · WGAN-GP Loss · Dropout · WaveGAN