Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios
Malina Chichirau, Rik van Noord, Antonio Toral

TL;DR
This paper presents a multilingual approach to automatically distinguish between human and machine translations, demonstrating improved accuracy and robustness across languages and systems using multilingual training and longer text sequences.
Contribution
It introduces a multilingual classification method that generalizes across source languages and machine translation systems, enhancing discrimination accuracy and robustness.
Findings
Multilingual classifiers perform well across different source languages.
Incorporating source text improves classifier accuracy and robustness.
Training on multiple source languages and longer texts enhances performance.
Abstract
We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German-English) can still perform well on English translations that come from different source languages, even when the machine translations were produced by other systems than the one it was trained on. Additionally, we demonstrate that incorporating the source text in the input of a multilingual classifier improves (i) its accuracy and (ii) its robustness on cross-system evaluation, compared to a monolingual classifier. Furthermore, we find that using training data from multiple source languages (German, Russian, and Chinese) tends to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
