A Unit-based System and Dataset for Expressive Direct Speech-to-Speech   Translation

Anna Min; Chenxu Hu; Yi Ren; Hang Zhao

arXiv:2502.00374·cs.CL·February 4, 2025

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation

Anna Min, Chenxu Hu, Yi Ren, Hang Zhao

PDF

Open Access

TL;DR

This paper presents a new dataset and model for speech-to-speech translation that emphasizes preserving paralinguistic information like emotions and attitudes, improving naturalness and expressiveness.

Contribution

It introduces a multilingual dataset with aligned paralinguistic features and a novel model integrating prosody transfer techniques for expressive translation.

Findings

01

Model retains more paralinguistic information

02

Achieves high translation accuracy

03

Enhances naturalness of translated speech

Abstract

Current research in speech-to-speech translation (S2ST) primarily concentrates on translation accuracy and speech naturalness, often overlooking key elements like paralinguistic information, which is essential for conveying emotions and attitudes in communication. To address this, our research introduces a novel, carefully curated multilingual dataset from various movie audio tracks. Each dataset pair is precisely matched for paralinguistic information and duration. We enhance this by integrating multiple prosody transfer techniques, aiming for translations that are accurate, natural-sounding, and rich in paralinguistic details. Our experimental results confirm that our model retains more paralinguistic information from the source speech while maintaining high standards of translation accuracy and naturalness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems