CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
Maitreya Patel, Mirali Purohit, Jui Shah, and Hemant A. Patil

TL;DR
This paper introduces CinC-GAN, a novel GAN-based approach that improves fundamental frequency prediction in whisper-to-normal speech conversion, outperforming previous CycleGAN methods in both objective and subjective evaluations.
Contribution
The paper proposes CinC-GAN, a new GAN architecture designed specifically to enhance F0 prediction accuracy without compromising MCC mapping in voice conversion.
Findings
CinC-GAN significantly outperforms CycleGAN in F0 prediction.
CinC-GAN shows superior results on unseen speakers.
Objective and subjective tests confirm the effectiveness of CinC-GAN.
Abstract
Recently, Generative Adversarial Networks (GAN)-based methods have shown remarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH (WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion is the prediction of fundamental frequency (F0). Recently, authors have proposed state-of-the-art method Cycle-Consistent Generative Adversarial Networks (CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses two different models, one for Mel Cepstral Coefficients (MCC) mapping, and another for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC mapping. This leads to additional non-linear noise in predicted F0. To suppress this noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is specially designed to increase the effectiveness in F0 prediction without losing the accuracy of MCC mapping. We evaluated the proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConvolution · PatchGAN · GAN Least Squares Loss · Tanh Activation · Cycle Consistency Loss · *Communicated@Fast*How Do I Communicate to Expedia? · Instance Normalization · HuMan(Expedia)||How do I get a human at Expedia? · Batch Normalization · Residual Connection
