Towards cross-language prosody transfer for dialog
Jonathan E. Avila, Nigel G. Ward

TL;DR
This paper explores the challenges of transferring prosody in speech-to-speech translation for dialogue, introducing a bilingual corpus and a prosodic dissimilarity metric to analyze cross-language prosodic differences.
Contribution
It presents a new data collection protocol and a simple metric for assessing cross-language prosody transfer, highlighting phenomena needing advanced modeling.
Findings
Developed a bilingual English-Spanish corpus with 1871 utterance pairs.
Created a prosodic dissimilarity metric based on Euclidean distance.
Identified phenomena requiring more sophisticated prosody transfer models.
Abstract
Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
