Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine   Translation

Michelle Kappl

arXiv:2502.19104·cs.CL·March 3, 2025

Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation

Michelle Kappl

PDF

1 Repo

TL;DR

This paper introduces WinoMTDE, a new dataset and evaluation method for assessing gender bias in German machine translation, revealing persistent biases across multiple systems and highlighting the relative performance of large language models.

Contribution

The paper presents WinoMTDE, the first gender bias evaluation dataset for German MT, extending existing methods and providing a large-scale bias assessment across several models.

Findings

01

Most MT systems exhibit gender bias.

02

Large language models outperform traditional systems.

03

Bias persists despite evaluation efforts.

Abstract

We present WinoMTDE, a new gender bias evaluation test set designed to assess occupational stereotyping and underrepresentation in German machine translation (MT) systems. Building on the automatic evaluation method introduced by arXiv:1906.00591v1, we extend the approach to German, a language with grammatical gender. The WinoMTDE dataset comprises 288 German sentences that are balanced in regard to gender, as well as stereotype, which was annotated using German labor statistics. We conduct a large-scale evaluation of five widely used MT systems and a large language model. Our results reveal persistent bias in most models, with the LLM outperforming traditional systems. The dataset and evaluation code are publicly available under https://github.com/michellekappl/mt_gender_german.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

michellekappl/mt_gender_german
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training