Multi-speaker Emotion Conversion via Latent Variable Regularization and   a Chained Encoder-Decoder-Predictor Network

Ravi Shankar; Hsi-Wei Hsieh; Nicolas Charon; Archana; Venkataraman

arXiv:2007.12937·eess.AS·August 12, 2020

Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network

Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana, Venkataraman

PDF

TL;DR

This paper introduces a novel neural network architecture for speech emotion conversion that leverages latent variable regularization and outperforms existing methods in quality and generalization.

Contribution

It presents a chained encoder-decoder-predictor model with LDDMM regularization for improved emotion conversion and out-of-sample generalization in speech synthesis.

Findings

01

Outperforms state-of-the-art in emotion saliency and speech quality

02

Enables conversion of unseen phrases in training data

03

Demonstrates effective latent embedding regularization

Abstract

We propose a novel method for emotion conversion in speech based on a chained encoder-decoder-predictor neural network architecture. The encoder constructs a latent embedding of the fundamental frequency (F0) contour and the spectrum, which we regularize using the Large Diffeomorphic Metric Mapping (LDDMM) registration framework. The decoder uses this embedding to predict the modified F0 contour in a target emotional class. Finally, the predictor uses the original spectrum and the modified F0 contour to generate a corresponding target spectrum. Our joint objective function simultaneously optimizes the parameters of three model blocks. We show that our method outperforms the existing state-of-the-art approaches on both, the saliency of emotion conversion and the quality of resynthesized speech. In addition, the LDDMM regularization allows our model to convert phrases that were not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.