Dialogs Re-enacted Across Languages
Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco

TL;DR
This paper introduces a protocol and dataset for collecting closely matched bilingual dialog pairs to enhance cross-language speech translation, providing resources and insights for researchers in speech and language processing.
Contribution
It presents a novel protocol for collecting bilingual dialog pairs, along with a publicly available dataset and observations to aid speech-to-speech translation research.
Findings
Dataset of matched bilingual dialogs released publicly
Protocol facilitates cross-language prosodic mapping research
Initial observations on dialog characteristics across languages
Abstract
To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
