Gender Coreference and Bias Evaluation at WMT 2020

Tom Kocmi; Tomasz Limisiewicz; Gabriel Stanovsky

arXiv:2010.06018·cs.CL·October 14, 2020

Gender Coreference and Bias Evaluation at WMT 2020

Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

PDF

2 Repos

TL;DR

This paper evaluates gender bias in machine translation systems across multiple languages, revealing consistent reliance on spurious correlations rather than contextual cues, using an extended WinoMT test suite.

Contribution

It presents the largest evaluation of gender bias in MT systems across four languages, extending the WinoMT test suite to Polish and Czech, and highlights the pervasive reliance on stereotypes.

Findings

01

All systems use spurious gender correlations

02

Bias is consistent across diverse languages

03

Models do not effectively utilize contextual information

Abstract

Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.