Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
Takuhiro Kaneko, Hirokazu Kameoka

TL;DR
This paper introduces CycleGAN-VC, a novel voice conversion method that learns mappings from unpaired source and target speech data without needing parallel datasets, avoiding over-smoothing and maintaining high quality.
Contribution
It presents a parallel-data-free voice conversion approach using cycle-consistent adversarial networks with gated CNNs, improving quality and avoiding over-smoothing compared to traditional methods.
Findings
Converted speech quality is comparable to parallel-data-based methods.
Objective metrics indicate near-natural feature sequences.
Method effectively learns mappings from unpaired data.
Abstract
We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is general purpose, high quality, and parallel-data free and works without any extra data, modules, or alignment procedure. It also avoids over-smoothing, which occurs in many conventional statistical model-based VC methods. Our method, called CycleGAN-VC, uses a cycle-consistent adversarial network (CycleGAN) with gated convolutional neural networks (CNNs) and an identity-mapping loss. A CycleGAN learns forward and inverse mappings simultaneously using adversarial and cycle-consistency losses. This makes it possible to find an optimal pseudo pair from unpaired data. Furthermore, the adversarial loss contributes to reducing over-smoothing of the converted feature sequence. We configure a CycleGAN with gated CNNs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsBatch Normalization · Residual Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Residual Block · Instance Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Sigmoid Activation
