Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings
Marko \v{C}echovi\v{c}, Nat\'alia Komorn\'ikov\'a, Dominik Mach\'a\v{c}ek, Ond\v{r}ej Bojar

TL;DR
This paper introduces a new multilingual dialogue corpus with minutes and annotations of misunderstandings, facilitating research on cross-lingual communication and automatic misunderstanding detection in multilingual meetings.
Contribution
The creation of a comprehensive cross-lingual dialogue corpus with minutes and the evaluation of large language models for automatic misunderstanding detection.
Findings
Gemini model detects misunderstandings with 77% recall and 47% precision.
Corpus includes 5 hours of multilingual speech with transcripts and translations.
Automatic misunderstanding detection shows promising results for cross-lingual communication.
Abstract
Speech processing and translation technology have the potential to facilitate meetings of individuals who do not share any common language. To evaluate automatic systems for such a task, a versatile and realistic evaluation corpus is needed. Therefore, we create and present a corpus of cross-lingual dialogues between individuals without a common language who were facilitated by automatic simultaneous speech translation. The corpus consists of 5 hours of speech recordings with ASR and gold transcripts in 12 original languages and automatic and corrected translations into English. For the purposes of research into cross-lingual summarization, our corpus also includes written summaries (minutes) of the meetings. Moreover, we propose automatic detection of misunderstandings. For an overview of this task and its complexity, we attempt to quantify misunderstandings in cross-lingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Text Readability and Simplification
