Enhancing expressivity transfer in textless speech-to-speech translation
Jarod Duret (LIA), Benjamin O'Brien (LIA), Yannick Est\`eve (LIA),, Titouan Parcollet (CAM)

TL;DR
This paper introduces a novel method for textless speech-to-speech translation that enhances the transfer of expressivity, including emotions and nuances, across languages by leveraging multilingual emotion embeddings at the speech unit level.
Contribution
The study proposes a new approach using multilingual emotion embeddings to improve expressivity transfer in speech-to-speech translation, operating at the discrete speech unit level.
Findings
Superior expressivity transfer compared to existing systems
Effective prediction of pitch and duration in target language
Validated on French-to-English translation task
Abstract
Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques. However, existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages. Expressivity plays a vital role in conveying emotions, nuances, and cultural subtleties, thereby enhancing communication across diverse languages. To address this issue this study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings to capture language-agnostic information. Specifically, we demonstrate how these embeddings can be used to effectively predict the pitch and duration of speech units in the target language. Through objective and subjective experiments conducted on a French-to-English translation task, our findings highlight the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
