# DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine   Translation

**Authors:** Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski

arXiv: 1905.13354 · 2019-06-03

## TL;DR

DiaBLa is a new bilingual dialogue corpus with human judgments and reference translations, designed to evaluate and analyze machine translation quality in informal English-French conversations.

## Contribution

It introduces a novel corpus of spontaneous bilingual dialogues with detailed quality assessments, aiding MT evaluation and communication analysis.

## Key findings

- Participants' judgments reveal perceptible differences in MT quality.
- The corpus enables evaluation of MT systems in realistic dialogue settings.
- Provides a resource for analyzing MT-mediated communication.

## Abstract

We present a new English-French test set for the evaluation of Machine Translation (MT) for informal, written bilingual dialogue. The test set contains 144 spontaneous dialogues (5,700+ sentences) between native English and French speakers, mediated by one of two neural MT systems in a range of role-play settings. The dialogues are accompanied by fine-grained sentence-level judgments of MT quality, produced by the dialogue participants themselves, as well as by manually normalised versions and reference translations produced a posteriori. The motivation for the corpus is two-fold: to provide (i) a unique resource for evaluating MT models, and (ii) a corpus for the analysis of MT-mediated communication. We provide a preliminary analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13354/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13354/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1905.13354/full.md

---
Source: https://tomesphere.com/paper/1905.13354