Lost in Translation: Loss and Decay of Linguistic Richness in Machine   Translation

Eva Vanmassenhove; Dimitar Shterionov; Andy Way

arXiv:1906.12068·cs.CL·July 1, 2019·48 cites

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

Eva Vanmassenhove, Dimitar Shterionov, Andy Way

PDF

Open Access

TL;DR

This paper empirically quantifies how current machine translation systems diminish lexical diversity compared to human translation, highlighting issues of bias amplification and loss of linguistic richness.

Contribution

It introduces a method to measure lexical richness loss in MT and demonstrates how MT systems tend to reduce diversity and reinforce biases.

Findings

01

MT systems show significant lexical richness loss

02

MT amplifies frequent patterns and biases

03

Human translation maintains higher lexical diversity

Abstract

This work presents an empirical approach to quantifying the loss of lexical richness in Machine Translation (MT) systems compared to Human Translation (HT). Our experiments show how current MT systems indeed fail to render the lexical diversity of human generated or translated text. The inability of MT systems to generate diverse outputs and its tendency to exacerbate already frequent patterns while ignoring less frequent ones, might be the underlying cause for, among others, the currently heavily debated issues related to gender biased output. Can we indeed, aside from biased data, talk about an algorithm that exacerbates seen biases?

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices