TL;DR
This paper evaluates gender bias in machine translation systems across multiple languages, revealing consistent reliance on spurious correlations rather than contextual cues, using an extended WinoMT test suite.
Contribution
It presents the largest evaluation of gender bias in MT systems across four languages, extending the WinoMT test suite to Polish and Czech, and highlights the pervasive reliance on stereotypes.
Findings
All systems use spurious gender correlations
Bias is consistent across diverse languages
Models do not effectively utilize contextual information
Abstract
Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
