Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li

TL;DR
This paper introduces Mixed-EVC, a novel voice conversion framework that synthesizes and controls mixed emotional expressions in speech by leveraging discrete emotion labels and an attribute vector for nuanced emotional rendering.
Contribution
The paper presents a new EVC method that models mixed emotions and enhances control using an attribute vector and ranking-based SVM, surpassing traditional discrete emotion conversion methods.
Findings
Effective synthesis of mixed emotions confirmed by evaluations
Enhanced control over emotional expression demonstrated
Outperforms traditional discrete emotion conversion baselines
Abstract
Emotional voice conversion (EVC) traditionally targets the transformation of spoken utterances from one emotional state to another, with previous research mainly focusing on discrete emotion categories. This paper departs from the norm by introducing a novel perspective: a nuanced rendering of mixed emotions and enhancing control over emotional expression. To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels. We construct an attribute vector that encodes the relationships among these discrete emotions, which is predicted using a ranking-based support vector machine and then integrated into a sequence-to-sequence (seq2seq) EVC framework. Mixed-EVC not only learns to characterize the input emotional style but also quantifies its relevance to other emotions during training. As a result, users have the ability to assign these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
