Dialogs Re-enacted Across Languages

Nigel G. Ward; Jonathan E. Avila; Emilia Rivas; Divette Marco

arXiv:2211.11584·cs.CL·July 17, 2023

Dialogs Re-enacted Across Languages

Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a protocol and dataset for collecting closely matched bilingual dialog pairs to enhance cross-language speech translation, providing resources and insights for researchers in speech and language processing.

Contribution

It presents a novel protocol for collecting bilingual dialog pairs, along with a publicly available dataset and observations to aid speech-to-speech translation research.

Findings

01

Dataset of matched bilingual dialogs released publicly

02

Protocol facilitates cross-language prosodic mapping research

03

Initial observations on dialog characteristics across languages

Abstract

To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joneavila/dral
noneOfficial

Datasets

jonavila/DRAL
dataset· 33 dl
33 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling