Decoupling Speaker-Independent Emotions for Voice Conversion Via   Source-Filter Networks

Zhaojie Luo; Shoufeng Lin; Rui Liu; Jun Baba; Yuichiro Yoshikawa and; Ishiguro Hiroshi

arXiv:2110.01164·eess.AS·October 5, 2021

Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa and, Ishiguro Hiroshi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel source-filter neural network model for emotional voice conversion that effectively decouples speaker-independent emotional features, achieving state-of-the-art results with a two-stage VA space training strategy.

Contribution

The paper proposes a new source-filter-based model with a two-stage training strategy to improve emotional voice conversion performance on nonparallel data.

Findings

01

Outperforms baseline methods in emotional VC quality

02

Achieves state-of-the-art results in speaker-independent emotional VC

03

Effective decoupling of emotional features from speaker identity

Abstract

Emotional voice conversion (VC) aims to convert a neutral voice to an emotional (e.g. happy) one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech information (such as speaker, content, etc.) is the key to achieving remarkable performance. Some recent attempts about speech representation decoupling on the neutral speech can not work well on the emotional speech, due to the more complex acoustic properties involved in the latter. To address this problem, here we propose a novel Source-Filter-based Emotional VC model (SFEVC) to achieve proper filtering of speaker-independent emotion features from both the timbre and pitch features. Our SFEVC model consists of multi-channel encoders, emotion separate encoders, and one decoder. Note that all encoder modules adopt a designed information bottlenecks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhaojieL/HTE-data
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing