Enhancing expressivity transfer in textless speech-to-speech translation

Jarod Duret (LIA); Benjamin O'Brien (LIA); Yannick Est\`eve (LIA),; Titouan Parcollet (CAM)

arXiv:2310.07279·cs.SD·October 12, 2023

Enhancing expressivity transfer in textless speech-to-speech translation

Jarod Duret (LIA), Benjamin O'Brien (LIA), Yannick Est\`eve (LIA),, Titouan Parcollet (CAM)

PDF

Open Access

TL;DR

This paper introduces a novel method for textless speech-to-speech translation that enhances the transfer of expressivity, including emotions and nuances, across languages by leveraging multilingual emotion embeddings at the speech unit level.

Contribution

The study proposes a new approach using multilingual emotion embeddings to improve expressivity transfer in speech-to-speech translation, operating at the discrete speech unit level.

Findings

01

Superior expressivity transfer compared to existing systems

02

Effective prediction of pitch and duration in target language

03

Validated on French-to-English translation task

Abstract

Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques. However, existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages. Expressivity plays a vital role in conveying emotions, nuances, and cultural subtleties, thereby enhancing communication across diverse languages. To address this issue this study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings to capture language-agnostic information. Specifically, we demonstrate how these embeddings can be used to effectively predict the pitch and duration of speech units in the target language. Through objective and subjective experiments conducted on a French-to-English translation task, our findings highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques