Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator
Ravi Shankar, Jacob Sager, Archana Venkataraman

TL;DR
This paper presents a non-parallel speech emotion conversion method using a novel variational cycle-GAN with a hybrid generator, achieving emotion transfer without parallel data and generalizing to new speakers.
Contribution
It introduces VC-GAN, a new model combining a hybrid generator and an adversarial pair discriminator for emotion conversion without parallel data.
Findings
Effective emotion conversion demonstrated with crowd-sourced evaluations.
Model generalizes to new speakers using Wavenet modifications.
Regularizes training with a hybrid generator architecture.
Abstract
We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired emotion conversion (e.g., A to B) or to its inverse (B to A). We will show that this setup, which we refer to as a variational cycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergence between the source features and their cyclic counterpart. In addition, our generator combines a trainable deep network with a fixed generative block to implement a smooth and invertible transformation on the input features, in our case, the fundamental frequency (F0) contour. This hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
