VAW-GAN for Disentanglement and Recomposition of Emotional Elements in   Speech

Kun Zhou; Berrak Sisman; Haizhou Li

arXiv:2011.02314·cs.SD·November 5, 2020

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Kun Zhou, Berrak Sisman, Haizhou Li

PDF

TL;DR

This paper introduces a novel VAW-GAN-based framework for emotional voice conversion that effectively disentangles and recomposes emotional elements in speech, improving emotional expressiveness while preserving content and speaker identity.

Contribution

It proposes a dual VAW-GAN pipeline for spectrum and prosody conversion, enabling better emotional element disentanglement and recomposition in speech.

Findings

01

Effective emotional voice conversion demonstrated in objective metrics.

02

Subjective evaluations show improved emotional expressiveness.

03

Framework preserves linguistic content and speaker identity.

Abstract

Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. In this paper, we study the disentanglement and recomposition of emotional elements in speech through variational autoencoding Wasserstein generative adversarial network (VAW-GAN). We propose a speaker-dependent EVC framework based on VAW-GAN, that includes two VAW-GAN pipelines, one for spectrum conversion, and another for prosody conversion. We train a spectral encoder that disentangles emotion and prosody (F0) information from spectral features; we also train a prosodic encoder that disentangles emotion modulation of prosody (affective prosody) from linguistic prosody. At run-time, the decoder of spectral VAW-GAN is conditioned on the output of prosodic VAW-GAN. The vocoder takes the converted spectral and prosodic features to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.