Learning Multilingual Expressive Speech Representation for Prosody   Prediction without Parallel Data

Jarod Duret (LIA); Titouan Parcollet (CAM); Yannick Est\`eve (LIA)

arXiv:2306.17199·eess.AS·July 3, 2023

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

Jarod Duret (LIA), Titouan Parcollet (CAM), Yannick Est\`eve (LIA)

PDF

Open Access

TL;DR

This paper introduces a multilingual emotion embedding approach for speech resynthesis that preserves emotional content across languages without requiring parallel data, improving cross-lingual emotion transfer in speech signals.

Contribution

The paper presents a novel multilingual emotion embedding method that enables emotion-preserving speech resynthesis across languages without relying on parallel datasets.

Findings

01

Outperforms baseline without emotional information

02

Effective cross-lingual emotion transfer demonstrated

03

Works for English and French speech signals

Abstract

We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a language-independent manner. We show that this embedding can be used to predict the pitch and duration of speech units in a target language, allowing us to resynthesize the source speech signal with the same emotional content. We evaluate our approach to English and French speech signals and show that it outperforms a baseline method that does not use emotional information, including when the emotion embedding is extracted from a different language. Even if this preliminary study does not address directly the machine translation issue, our results demonstrate the effectiveness of our approach for cross-lingual emotion preservation in the context of speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Emotion and Mood Recognition