StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep   Embeddings

Arnab Das; Suhita Ghosh; Tim Polzehl; Sebastian Stober

arXiv:2309.07592·eess.AS·September 15, 2023

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

Arnab Das, Suhita Ghosh, Tim Polzehl, Sebastian Stober

PDF

Open Access 1 Repo

TL;DR

This paper enhances voice conversion by introducing emotion-aware techniques to better preserve emotional content, addressing limitations of previous GAN-based methods like StarGANv2-VC.

Contribution

It proposes novel emotion-aware loss functions and an unsupervised approach to disentangle speaker and emotion representations in voice conversion.

Findings

01

Improved emotion preservation in voice conversion

02

Effective disentanglement of speaker and emotion features

03

Robust performance across diverse datasets and emotions

Abstract

Voice conversion (VC) transforms an utterance to sound like another person without changing the linguistic content. A recently proposed generative adversarial network-based VC method, StarGANv2-VC is very successful in generating natural-sounding conversions. However, the method fails to preserve the emotion of the source speaker in the converted samples. Emotion preservation is necessary for natural human-computer interaction. In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion. Specifically, there is an emotion leakage from the reference audio used to capture the speaker embeddings while training. To counter the problem, we propose novel emotion-aware losses and an unsupervised method which exploits emotion supervision through latent emotion representations. The objective and subjective evaluations prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arnabdas8901/StarGAN-VC_PlusPlus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing