CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram   Conversion

Takuhiro Kaneko; Hirokazu Kameoka; Kou Tanaka; Nobukatsu Hojo

arXiv:2010.11672·cs.SD·October 23, 2020·5 cites

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

PDF

Open Access 2 Repos

TL;DR

This paper introduces CycleGAN-VC3, an improved non-parallel voice conversion method that effectively converts mel-spectrograms by incorporating time-frequency adaptive normalization, outperforming previous CycleGAN-VC models.

Contribution

CycleGAN-VC3 enhances mel-spectrogram conversion by integrating TFAN, addressing structural preservation issues in previous CycleGAN-VC models, and demonstrating improved naturalness and similarity.

Findings

01

CycleGAN-VC3 outperforms previous models in subjective evaluations.

02

Incorporating TFAN improves preservation of time-frequency structure.

03

CycleGAN-VC3 is effective for both inter-gender and intra-gender voice conversion.

Abstract

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing