Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation
Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

TL;DR
This paper investigates how algorithmic bias in machine translation models leads to reduced linguistic richness, demonstrating that both phrase-based and neural MT paradigms produce translations with diminished lexical and morphological diversity.
Contribution
It extends bias analysis in MT beyond gender, showing that algorithmic bias causes language impoverishment across multiple paradigms and language pairs.
Findings
All MT paradigms show reduced lexical richness.
Morphological diversity decreases in translations.
Bias amplification impacts language complexity.
Abstract
Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the 'algorithmic bias', i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: 'machine translationese'. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms - phrase-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
