Machine Translationese: Effects of Algorithmic Bias on Linguistic   Complexity in Machine Translation

Eva Vanmassenhove; Dimitar Shterionov; Matthew Gwilliam

arXiv:2102.00287·cs.CL·February 2, 2021

Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

PDF

TL;DR

This paper investigates how algorithmic bias in machine translation models leads to reduced linguistic richness, demonstrating that both phrase-based and neural MT paradigms produce translations with diminished lexical and morphological diversity.

Contribution

It extends bias analysis in MT beyond gender, showing that algorithmic bias causes language impoverishment across multiple paradigms and language pairs.

Findings

01

All MT paradigms show reduced lexical richness.

02

Morphological diversity decreases in translations.

03

Bias amplification impacts language complexity.

Abstract

Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the 'algorithmic bias', i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: 'machine translationese'. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms - phrase-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.