A Parallel Corpus of Translationese
Ella Rabinovich, Shuly Wintner, Ofek Luis Lewinsohn

TL;DR
This paper introduces bilingual English--French and English--German parallel corpora with annotated translation directions, supporting research on translationese identification and its applications in human and machine translation.
Contribution
It provides diverse, annotated corpora for translationese research, validated through replication and extension of previous identification experiments.
Findings
Successful replication of translationese identification results
Extended experiments to additional datasets and languages
Validated quality and reliability of the corpora
Abstract
We describe a set of bilingual English--French and English--German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) translation; specifically, they can be used for the task of translationese identification, a research direction that enjoys a growing interest in recent years. To validate the quality and reliability of the corpora, we replicated previous results of supervised and unsupervised identification of translationese, and further extended the experiments to additional datasets and languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
