StarGAN-VC: Non-parallel many-to-many voice conversion with star   generative adversarial networks

Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo

arXiv:1806.02169·cs.SD·July 2, 2018·48 cites

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

PDF

Open Access 5 Repos

TL;DR

This paper introduces StarGAN-VC, a non-parallel many-to-many voice conversion method using a single GAN that requires minimal training data, operates in real-time, and outperforms previous autoencoder-based approaches in sound quality and speaker similarity.

Contribution

The paper presents StarGAN-VC, a novel GAN-based voice conversion framework capable of multi-domain conversion without parallel data, enabling real-time processing with minimal training.

Findings

01

Higher sound quality than previous methods

02

Better speaker similarity in conversions

03

Operates in real-time with few training samples

Abstract

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate converted speech signals quickly enough to allow real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic-sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing